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Abstract — There is a fundamental relationship between belief 
propagation and maximum a posteriori decoding. A decoding 
algorithm, which we call the Maxwell decoder, is introduced and 
provides a constructive description of this relationship. Both, 
the algorithm itself and the analysis of the new decoder are 
reminiscent of the Maxwell construction in thermodynamics. This 
paper investigates in detail the case of transmission over the 
binary erasure channel, while the extension to general binary 
memoryless channels is discussed in a companion paper. 

Index Terms — belief propagation, maximum a posteriori, max- 
imum likelihood, Maxwell construction, threshold, phase transi- 
tion, Area Theorem, EXIT curve, entropy 



I. Introduction 

IT is a key result, and the starting point of iterative coding, 
that belief propagation (BP) is optimal on trees. See, e.g., 
[5]-[8]. However, trees with bounded state size appear not to 
be powerful enough models to allow transmission arbitrarily 
close to capacity. For instance, it is known that in the setting 
of standard binary Tanner graphs the error probability of codes 
defined on trees is lower bounded by a constant which only 
depends on the channel and the rate of the code [9], [10]. The 
general wisdom is therefore to apply BP decoding to graphs 
with loops and to consider this type of decoding as a (typically) 
strictly suboptimal attempt to perform maximum a posteriori 
(MAP) bit decoding. One would therefore not expect any link 
between the BP and the MAP decoder except for the obvious 
suboptimality of the BP decoder. 

This contribution demonstrates that there is a fundamental 
relationship between BP and MAP decoding which appears in 
the limit of large blocklengths. This relationship is furnished 
by the so-called Maxwell (M) decoder. The M decoder com- 
bines the BP decoder with a "guessing" device to perform 
MAP decoding. It is possible to analyze the performance 
of the M decoder in terms of the EXIT curve introduced 
in [11]. This analysis leads to a precise characterization of 
how difficult it is to convert the BP decoder into a MAP 
decoder and this "gap" between the MAP and BP decoder has 
a pleasing graphical interpretation in terms of an area under the 
EXIT curve. 1 Further, the MAP threshold is determined by a 
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balance between two areas representing the number of guesses 
and the reduction in uncertainty, respectively. The analysis 
gives also rise to a generalized Area Theorem, see also [12], 
and it provides an alternative tool for proving area-like results. 

The concept of a "BP decoder with guesses" itself is not 
new. In [13] the authors introduced such a decoder in order to 
improve the performance of the BP decoder. Our motivation 
though is quite different. Whereas, from a practical point 
of view, such enhancements work best for relatively small 
code lengths, or to clean up error floors, we are interested 
in the asymptotic setting in which the unexpected relationship 
between the MAP decoder and the BP decoder emerges. 

A. Preliminaries 

Assume that transmission takes place over a binary erasure 
channel with parameter e, call it BEC(e). More precisely, 
the transmitted bit Xi at time i, x\ G X = {0, 1}, is 
erased with probability e. The channel output is the ran- 
dom variable Yi which takes values in y = {0, *, l}. To 
be concrete, we will exemplify all statements using Low- 
Density Parity-Check (LDPC) code ensembles [14]. However, 
the results extend to other ensembles like, e.g., Generalized 
LDPC or turbo codes, and we will state the results in a 
general form. For an in-depth introduction to the analysis of 
LDPC ensembles see, e.g., [15]— [18]. For convenience of the 
reader, and to settle notation, let us briefly review some key 
statements. The degree distribution (dd) pair (A(x),p(x)) = 
Q2j AjX^ -1 , J2j Pj^^ 1 ) represents the degree distribution 
of the graph from the edge perspective. We consider the 
ensemble LDPC(A,p, n) of such graphs of length n and we 
are interested in its asymptotic average performance (when 
the blocklength n — > oo). This ensemble can equivalently 
be described by S = (A(x),r(x)) = iV ; A ,x J . V . l',x ; ;. 
which is the dd pair from the node perspective 2 . An important 
characteristic of the ensemble LDPC(A, p, n) is the design 
rate r = 1 - / p/ J A = 1 - A'(l)/r'(l). We will write 
r = r(\,p) or r = r(A, T) whenever we regard the design 
rate as a function of the degree distribution pair. 

The BP threshold, call it e BP = e BP (A,p), is defined in 
[15]-[18] as e BP = sup{e g [0,1] : eA(l - p{\ - x)) < 
x,Vx € (0, 1]}. Operationally, if we transmit at e < e BP and 
use a BP decoder, then all bits except possibly a sub-linear 
fraction can be recovered when n — > oo. On the other hand, if 
e > e BP , then a fixed fraction of bits remains erased after BP 

2 The changes of representation are obtained via A(x) = 
(1/f A)/ x X(u)du, T(x) = (1/ / P )f*p(u)du, A(x) = A'(x)/A'(l) and 

p(x) = r%)/r'(i). 
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decoding when n — > oo. In a similar manner we can define the 
MAP threshold. This threshold was first found via the replica 
method in [19]. Further, in [2] a simple counting argument 
leading to an upper bound for this threshold was given. The 
argument is explained and sharpened in Sec. [V] In this paper 
we develop the point of view taken in [1]. The reference 
quantity is then the extrinsic 3 entropy, in short EXIT. 4 The 
EXIT curve associated to the z th variable is a function of the 
channel entropy and it is defined as H(Xi | YinlUi})' Hereby, 
Xi represents the I th input bit and, for S C [n] = {1, . . . , n}, 
Xs represents the 1 51 -tuple of all bits indexed by S. For 
notational simplicity, let us write X^i = Xr„i\/n when a 
single bit is omitted and X = Xr„i for the entire vector. The 
uniformly averaged quantity — J27=i H{Xi | Y^i) is called the 
EXIT function. Recall that if there is a uniform prior on 
the set of hypotheses then the maximum a posteriori and the 
maximum likelihood decoding rule are identical. Let $™ AP = 
(j>™ Ae (Y^i) denote the extrinsic MAP bit estimate (sometimes 
called extrinsic information) associated to the i th bit. This 
can be any sufficient statistics for Xi given Y~i. Since we 
deal with binary variables, we can always think of it as the 
conditional expectation </>™ AP (X^) = E[Xj|Y^]. Observe that 
H(Xi | Y^i) = H(Xi | $f» ). 

B. Overview of Results 

Consider a dd pair (A, p) and the corresponding sequence of 
ensembles LDPC(?i, A, p) of increasing length n. Fig.[^shows 
the asymptotic EXIT curve for the regular dd pair (A(.t) = 
x 2 ,p(x) = X 5 ). 
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Fig. 1. BP and MAP EXIT curves for the dd pair (\(x) = x 2 , p(x) = x 5 ). 
(a) BP EXIT curve /i BP (e): its parametric equation is stated in It is zero 
until e BP at which point it jumps. It further continues smoothly until it reaches 
one at e = 1. (b) MAP EXIT curve h MAP (e). Note that the figure (b) includes 
also the "spurious" branch of Eq. Q- The spurious branch corresponds to 
unstable fixed points. The MAP threshold is determined by the balance of the 
two dark gray areas. 

Formally, this EXIT curve is ft, MAP (e) = 
Um™i£r=i#( X il y ~i( £ )) = lim^oo ±H (X, | $ MAP ). 
Its main characteristics are as follows: the function is zero 
below the MAP threshold e MAP , it jumps at e MAP to a non-zero 

3 The term extrinsic is used when the observation of the bit itself is ignored, 
see [20], [21]. 

4 The term EXIT , introduced in [1 1], stands for extrinsic (mutual) informa- 
tion transfer. Rather than using mutual information we opted to use entropies 
which in our setting simply means one minus mutual information. It is natural 
to use entropy in the setting of the binary erasure channel since the parameter 
e itself represents the channel entropy. 



value and continues then smoothly until it reaches one for 
e = 1. The area under the EXIT curve equals the rate of 
the code, see [12]. Compare this to the equivalent function 
of the BP decoder which is also shown in Fig. [2 The BP 
EXIT curve /i BP (e) = lim n ^ M ^H(Xi | <J>f) corresponds to 
running a BP decoder on a very large graph until the decoder 
has reached a fixed point. The extrinsic entropy of the bits at 
this fixed point gives the BP EXIT curve. This curve is given 
in parametric form by 

( a(i-,V*)) ,A(1 - p(1 - x)) )' (1) 

where x indicates the erasure probability of the variable-to- 
check messages. To see this, note that when transmission takes 
place over BEC(e), then the BP decoder reaches a fixed point 
x which is given by the solution of the density evolution (DE) 
equation eA(l — p(l— x)). We can therefore express e as e(x) = 
A(i -p(i-x)) • Now the average extrinsic probability that a bit is 
still erased at the fixed point is equal to A(l — p(l — x)). Note 
that the BP EXIT curve is the trace of this parametric equation 
for x starting at x = 1 until x = x BP . This is the critical point 
and e(x BP ) = e BP . Summarizing, the BP EXIT curve is zero 
up to the BP threshold e BP where it jumps to a non-zero value 
and then continues smoothly until it reaches one at e = 1. 
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Fig. 2. Balance of areas for the Maxwell decoder between the number of 
guesses in (a) and the number of contradictions in (b). The two dark gray 
areas are equal at the MAP threshold. These two areas differ from the areas 
indicated in Fig. Q only by a common part. 

In [1] it was pointed out that for the investigated cases the 
following two curious relationships between these two curves 
hold: First, the BP and the MAP curve coincide above e MAP . 
Second, the MAP curve can be constructed from the BP curve 
in the following way. If we draw the BP curve as parameterized 
in Q not only for x £ [x BP , 1] but also for x £ (0, x BP ) we 
get the curve shown in the right picture of Fig. \l\ Notice that 
the branch for x £ (0, x BP ) corresponds to unstable fixed points 
under BP decoding. Moreover, the fraction of erased messages 
x decreases along this branch when the erasure probability is 
increased and it satisfies e(x) > e. Because of these peculiar 
features, it is usually considered as "spurious". To determine 
the MAP threshold take a vertical line at e = e BP and shift it 
to the right until the area which lies to the left of this line and 
is enclosed by the line and the BP EXIT curve is equal to the 
area which lies to the right of the line and is enclosed by the 
line and the BP EXIT curve (these areas are indicated in dark 
gray in the picture). This unique point determines the MAP 
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threshold. The MAP EXIT curve is now the curve which is 
zero to the left of the threshold and equals the iterative curve to 
the right of this threshold. In other words, the MAP threshold 
is determined by a balance between two areas. It turns out 
that there is an operational meaning to this balance condition. 
We define the so-called Maxwell (M) decoder which performs 
MAP decoding by combining BP decoding with guessing. The 
dark gray areas in in the right picture of Fig. |2] differ from 
the ones in Fig. ^ only by a common part. We can show 
that the gray area on the left is connected to the number 
of "guesses" the M decoder has to venture, while the gray 
area on the right represents the number of "confirmations" 
regarding these guesses. The MAP threshold is determined 
by the condition that the number of confirmations balances 
the number of guesses (i.e., that each guess is confirmed), 
and therefore the two areas are equal: in other words, at the 
MAP threshold (and below) there is just a single codeword 
compatible with the channel received bits. 

The EXIT curves 
depicted in Fig. [2 
are representative 
for a large family of 
degree distributions, 
e.g., those of 
regular LDPC 
ensembles. But 
more complicated 
scenarios are 
possible. Fig. [3] 
depicts a slightly 
more general case 
in which the BP 
EXIT curve and the 
MAP EXIT curve have two jumps. As can be seen from this 
figure, the same kind of balance condition holds in this case 
locally and it determines the position of each jump. 

C. Paper Outline 

We start by considering the conditional entropy H(X\Y), 
where X is the transmitted codeword and Y the received 
sequence, and we derive the so-called Area Theorem for finite- 
length codes. When applying the Area Theorem to the binary 
erasure channel, the notion of EXIT curve enters explicitly. 
Next, we show that when the codes are chosen randomly from 
a suitable defined ensemble then the individual conditional 
entropies and EXIT curves concentrate around their ensemble 
averages. This is the first step towards the asymptotic analysis. 

We continue by defining the three asymptotic EXIT curves 
of interest. These are the (MAP) EXIT curve, the BP 
EXIT curve, and the EBP EXIT curve (which holds extended 
BP EXIT and includes the spurious branch). We show that 
the Area Theorem remains valid in the asymptotic setting. As 
an immediate consequence we will see that for some classes 
of ensembles (roughly those for which the stability condition 
determines the threshold) BP decoding coincides with MAP 
decoding. 

We then present a key point of the paper, which is the 
derivation of an upper-bound for the MAP threshold. Several 



Fig. 3. BP (dashed and solid line) and MAP 
(thick solid line) EXIT curves for the ensemble 
discussed in Examples 171 and ITol Both curves 
have two jumps. The two jumps of the MAP 
EXIT curve are both determined by a local 
balance of areas. 



examples illustrate this technique and lead to suggests the 
tightness of the bound. 

The same result is recovered through a counting argument 
that, supplemented by a combinatorial calculation, implies the 
tightness of the bound. 

Finally, we introduce the so-called M decoder which pro- 
vides a unified framework for understanding the connection 
between the BP and the MAP decoder. A closer analysis of 
the performance of the M decoder will allow us to prove a 
refined upper bound on the MAP threshold and it will give 
rise to a pleasing interpretation of the MAP threshold as that 
parameter in which two areas under the EBP EXIT curve are 
in balance. 

We conclude the paper by discussing some applications of 
our method. 



II. Finite-Length Codes: Area Theorem and 
Concentration 

Let X be the transmitted codeword and let Y be the received 
word. The conditional entropy H(X | Y) is of fundamental 
importance if we consider the question whether reliable com- 
munication is possible. Let us see how this quantity appears 
naturally in the context of decoding. To this end, we first recall 
the original Area Theorem as introduced in [12]. 

Theorem 1 (Area Theorem): Let X be a binary vector of 
length n chosen with probability px (x) from a finite set. Let Y 
be the result of passing X through BEC(e). Let 17 be a further 
observation of X so that Pq | x ,y ( w I x : v) = Pn\ x{w \ x). To 
emphasize that Y depends on the channel parameter e we write 
Y(e). Then 



H (X | Q) 



f 1 £ H ( x 



y^(e),0)de. (2) 



The reader familiar with the original statement in [12] will 
have noticed that we have rephrased the theorem. First, we 
expressed the result in terms of entropy instead of mutual 
information. Second, the observations Y and SI represent what 
in the original theorem were called the "extrinsic" information 
and the "channel," respectively. 

In (|2) the integration ranges from zero (perfect channel) 
to one (no information conveyed). The following is a trivial 
extension. 

Theorem 2 (Area Theorem): Let X be a binary vector of 
length n chosen with probability px (x) from a finite set. Let Y 
be the result of passing X through BEC(e). Let 17 be a further 
observation of X so that p^i \ x.y^ I x , V) = Vn\x( UJ \x). 
Then 



H(X\Y(e*),n) 



[ - V H(Xi\Y^(e),fl)de. 



Proof of Theorem^ Let Y^ be the result of passing X 
through BEC(e) and Y 1 - 2 ' be the result of passing X through 
BEC(e*). Let il be the additional observation of X. Applying 
Theorem [2 with Y = Y^ and with additional observa- 
tion (Y( 2 \il), we have p a Y(2 ) | x>Ym (u, y (2) | x, = 
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Pa y< 2 ) I x), as required, so that we get 

h(x | y( 2 V), fi)=fx; 1 y«( £ ), y< 2 V),tt)d £ . 

J ° ig[»] 

Now note that 

if(X i |yW(e),y( 2 )(e*),0) = e'tfL^IK^ee*)^). 

This is true since the bits of 1^,^ (e) and y( 2 )(e*) are erased 
independently (so that the respective erasure probabilities 
multiply) and since Y^ 2 \e*) contains the intrinsic observation 
of bit Xj, which is erased with probability e*. If we now 
substitute the right hand side of the last expression in our 
previous integral and make the change of variables e' = e- e*, 
Theorem |2 follows. ■ 

Assume that we allow each Xi to be passed through a 
different channel BEC(ej). Rather than phrasing our result 
specifically for the case of the BEC(ej), let us state the area 
theorem right away in its general form as introduced in [4] . In 
this paper we will only be interesting in the consequences as 
they pertain to transmission over the BEC(e). The investigation 
of the general case is relegated to the companion paper [22]. 

In order to state this and subsequent results in a more 
compact form we introduce the following definition. 

Definition 1 (Channel Smoothness): Consider a family of 
memoryless channels with input and output alphabets X 
and y, respectively, and characterized by their transition 
probability distribution functions (pdf's) Py\x(v \ x ). If y is 
discrete, we interpret py\ x (• | x) as a pdf with respect to the 
counting measure. If y is continuous, p Y \ x (y I x ) is a density 
with respect to Lebesgue measure. Assume that the family 
is parameterized by e, where e takes values in some interval 
I CI, The channel is said to be smooth with respect to the 
parameter e if the pdf's {py \x{y\ x ) ■ x £ X,y € y} are 
differentiable functions of e E I. 

Notice that, if a channel family is smooth, then several basic 
properties of the channel are likely to be differentiable with 
respect to the channel parameter. A basic (but important) 
example is the channel conditional entropy H(Y\X) = 
E[- \og{p Y \x(Y \X)}} given a reference measure px( x ) 
on X . Suppose that y is finite, and that, for any e € I, 
Py\x(v\ x ) > for any x G X, y G y. Then 

1 \ dp Y \x 

I 

de 



dH(X\Y) ^ 



x.y 



py\x{v\x) 



de 



-(y\x) 



In other words, differentiability of H(Y\X) follows from 
differentiability of Py\x(v\x) and of — xlogx. In this paper 
we consider families of binary erasure channels which are 
trivially smooth with respect to the parameter e. 

Theorem 3 (General Area Theorem- [4]): Let X be a bi- 
nary vector of length n chosen with probability px(x) from a 
finite set. Let the channel from X to Y be memoryless, where 
Yi is the result of passing Xi through a smooth channel with 
parameter e,, e; G Ii- Let fl be a further observation of X so 
that p n | x,y {w\x,v) =Pn\x(v\x). Then 



AH{X | y, 0) 



^ G>gpQ|y,o) dc 



(3) 



Proof: For i G [n], the entropy rule gives H(X | Y 1 CI) 
H{Xi \Y,Cl) + H(X^ | X h y, n). We have p x „. , x >,Y,n 
Px^i | Xi Y^i ft since the channel is memoryless and Pq,\x,y 
Pn~x. Therefore, H(X^ \Xi,Y,Q) = H(X^ \ X l7 Y^,Sl) 



and 



dH(x | y,n) _ dH(x t I Y,n) 



From this the total derivate as 



stated in Q follows immediately. ■ 
Alternative proof of Theorem |2} Keeping in mind that 
transmission takes place over a binary erasure channel, we 
write 

H(Xi \Y,n)= Yl py> (vi) H ( x ^ I y * = w. n ) • 

The terms corresponding to ^ G {0, 1} vanish because 
Xi is then completely determined by the channel output. 
The remaining term yields H(Xi | Y, O) = 6iH(Xi | Y^j, f2), 
because py^*) = £i, and the occurrence at the channel output 
of an erasure at position i is independent from X, Y^i and £1 
We can then write 



»e[n] £l 
= ^ H(X i |y^,0)de i , 

i£[n] 

which, when we assume that 6^ = 6 for all i G [n], gives 
Theorem [2] ■ 

A few remarks are in order. First, the additional degree of 
freedom afforded by allowing an extra observation O is useful 
when studying the dynamical behavior of certain iterative 
coding schemes via EXIT chart arguments. (For example, in a 
parallel concatenation, Y typically represents the observation 
of the systematic bits and ft represents the fixed channel 
observation of the parity bits.) For the purpose of this paper 
however, the additional observation fi is not needed since 
we are not concerned by componentwise EXIT charts. We 
will therefore skip fl in the sequel. Second, as emphasized 
in the last step in the previous proof, we can assume at this 
point, more generally, that the individual channel parameters 
ej are not the same but that the individual channels are all 
parametrized by a common parameter e. For instance one 
may think of a families {BEC(e;)} where ej(e) are smooth 
functions of e G [0, 1]. In the simplest case some parameter 
might be chosen to be constant. This degree of freedom allows 
for an elegant proof of Theorem [8] 

One of the main aims of this paper is to investigate 
the MAP performance of sparse graph codes in the limit 
of large blocklengths. Our task is made much easier by 
realizing that we can restrict our study to the average such 
performance. More precisely, let G be chosen uniformly at 
random from LDPC(A, p, n) and let H G (X\Y) denote the 
conditional entropy for the code G. We state the following 
theorems right away for general binary memoryless symmetric 
(BMS) channels. 

Theorem 4 (Concentration of Conditional Entropy): Let G 
be chosen uniformly at random from LDPC(n, A, p). Assume 
that G is used to transmit over a BMS channel. By some 
abuse of notation, let H G r n \ = H G (X \Y) be the associated 
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conditional entropy. Then for any £ > 

Pr{|JJ 0(n) - E [H a[n) ] | > <} < 2e-" B « 2 , 



where _B = l/(2(r max + 1) 2 (1 — r)) and where r max is the 
maximal check-node degree. 

Proof: The proof uses the standard technique of first con- 
structing a Doob's martingale with bounded differences and 
then applying the Hoeffding-Azuma inequality. The complete 
proof can be found in [23] and it is reported in an adapted 
and streamlined form in Appendix |I] ■ 
Let us now consider the concentration of the MAP 
EXIT curve. For the BEC this curve is given equivalently 

^ £££=itfc(n)(*i|r~i(e)) or b y k H kn)( X \ Y &)- We 
choose the second representation and phrase the statement in 

terms of the derivative of the conditional entropy with respect 

to the channel parameter e. 

Theorem 5 (Concentration of MAP EXIT Curve): Let G be 

chosen uniformly at random from LDPC(n, A, p) and let 

{BMS(e)} ee / denote a family of BMS channels ordered by 

physical degradation (with BMS(e') physically degraded with 

respect to BMS(e) whenever e' > e) and smooth with respect 

to e. Assume that G is used to transmit over the BMS(e) 

channel. Let -Hcfn) = Hq{X \ Y) be the associated conditional 

entropy. Denote by HL<, the derivative of with respect 

to e (such a derivative exists because of the explicit calculation 

presented in Theorem|5J and let J C I be an interval on which 

linin^oo iE [H G ( n )l exists and is differentiable with respect 

to e. Then, for any e G J and £ > there exist an > 

such that, for n large enough 



Pr{l%)-EK W ]I><} 



< e" 



Furthermore, if lirrin^oo — E [flo(n)J i s twice differentiable 
with respect to e G J, there exists a strictly positive constant 
A such that a$ > A£, 4 . 

The proof is deferred once more to Appendix U 

Notice the two extra hypothesis with respect to Theorem 0] 
First, we assumed that the channel family {BMS(e)} ee / is 
ordered by physical degradation. This ensures that H' n is non- 
negative. This condition is trivially satisfied for the family 
{BEC(e)} e g[o,i]. More generally, we can let e be any function 
of the erasure probability differentiable and increasing from 
zero to one. The second condition, namely the existence and 
differentiability of the expected entropy per bit in the limit, 
is instead crucial. As discussed in the previous section (see, 
e.g., Fig. [Q, the asymptotic EXIT curve may have jumps. By 
Theorem [2] these jumps correspond to discontinuities in the 
derivative of the conditional entropy. At a jump e*, the value of 
the EXIT curve may vary dramatically when passing from one 
element of the ensemble to the other. Some (a finite fraction) 
of the codes will perform well, and have an EXIT curve close 
to the asymptotic value at e* — S, while others (a finite fraction) 
may have an EXIT function close to the asymptotic value at 
e* + S (S is here a generic small positive number). 

Theorem 6 (Concentration of BP EXIT Curve): Let G be 
chosen uniformly at random from LDPC(n, A, p). Assume that 
G is used to transmit over a BMS channel and let $f ' = 



4>™' (Y~i) denote the extrinsic estimate (conditional mean) of 
Xi produced by the BP decoder after t iterations. Denote by 
H™1 = Hc(Xi | ) the resulting (extrinsic) entropy of the 
binary variable Xi. Then, for all £ > 0, there exists > 0, 
such that 



Pr 



n 



H, 



BP, / 



> n 



< e~ 



(4) 



Proof: The proof is virtually identical to the ones given 
in [15], [17] where the probability of decoding error is 
considered. ■ 

III. Asymptotic Setting 

A. (MAP) EXIT 

The next definition and theorem define our main object of 
study. 

Definition 2: Let C(n) be a sequence of code en- 
sembles of diverging blocklength n and let G(n) be 
chosen uniformly at random from C(n). Assume that 



Km,- 



Eg 



exists. Then this limit 



}^i^ {n) (X\Y(e)) 
is called the asymptotic EXIT function of the family of 
ensembles and we denote it by h MAP (e). We define the MAP 
threshold e MAP to be the supremum of all values e such that 
h MAP (e) = 0. 

Given a dd pair (X,p), consider the sequence of ensem- 
bles {LDPC(A, p, n)} n . It is natural to conjecture that the 
associated asymptotic EXIT function exists. Note that from 
Theorem [5] we know that if this limit exists, then individual 
code instances are closely concentrated around the ensemble 
average. It is therefore meaningful to define in such a setting 
the MAP threshold in terms of the ensemble average. 

Unfortunately, no general proof of the existence of the MAP 
EXIT curve is known. But we will show how one can in most 
cases compute the asymptotic EXIT function explicitly for 
a given ensemble, thus proving existence of the limit in such 
cases. See also [24] for a discussion on asymptotic thresholds. 

It is worth pointing out that we defined the MAP threshold 
to be the channel parameter at which the conditional entropy 
becomes sublinear. At this point the average conditional bit 
entropy converges to zero, so that this point is the bit MAP 
threshold. We note that for some ensembles the block MAP 
threshold is strictly smaller than the bit MAP threshold. 

Theorem 7 (Asymptotic Area Theorem): Consider a 
dd pair (A, p). Assume that the associated asymptotic 
EXIT function as defined in Definition [2] exists 
for all e G [0,1]. Assume further that the limit 



lim r 



En 



H(X) 



exists. Then 



h MAP (e)de. 



Proof: Let (e) denote the EXIT function associated 
to a particular G G LDPC(A, p, n) with rate rw n ). We have 



E fi 



de 



E G 



H{X) 



(5) 
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The first equality is obtained by noticing that the function 
^G(n) ( e ) * s non - ne g a tive. We are therefore justified by Fubini 
theorem to switch the order of integration. The second step 



follows from the Area Theorem (the rate being equal to 



H(X) 



On the other hand, the Dominated Convergence Theorem 



LMAP 

n a(n) 



']}. 



can be applied to the sequence {Eg 

converges (as assumed in the hypothesis) to /i MAP (e) 
trivially upper-bounded by 1 . We therefore get 



since it 
and is 



lim 



E, 



L MAP 

n G(n] 



(0 



dr 



lim 



LMAP 



(0 



dr 



= / /i MAP (e)de. 



which, combined with l|5}, concludes the proof. ■ 
Lemma [7] gives a sufficient condition for the limit r as to 
exists. Note that under this condition the asymptotic rate r as 
is equal to to the design rate r(X,p). Most dd pairs (A,p) 
encountered in practice fulfill this condition. This condition is 
therefore not very restrictive. 

B. BP EXIT 

Recall that the MAP EXIT curve can be expressed as 
H(X l | $f AP ) where $f AP = ^{Y^) is the posterior esti- 
mate (conditional mean) of X, given Y^i. Unfortunately this 
quantity is not easy to evaluate. In fact, the main aim of this 
paper is to accomplish this task. 

h B¥ A related quantity which is 

much easier to compute is 
the BP EXIT curve shown in 
Fig. |4] for the dd pair (x 2 ,x 5 ). 
The BP EXIT corresponds to 
H(Xi\$f), where $f = 
(f)f(Y^i) is the extrinsic esti- 
mate of Xi delivered by the 
BP decoder Here a fixed num- 
ber of iterations, let us say t, 
is understood. Asymptotically, 
we consider t — > oo after n — > 
oo. An exact expression for the 




Fig. 4. BP EXIT function e 
h Be (e). 



average asymptotic BP EXIT curve for LDPC ensembles is 
easily computed via the DE method [15]— [18]. 

Consider the fixed-point condition for the density evolution 
equations, 

eA(l-p(l-x)) =x. 

Solving for e, we get e(x) = x(i-p(i-x)) , x G (0, 1], In words, 
for each non-zero fixed-point x of density evolution, there is a 
unique channel parameter e. At this fixed-point the asymptotic 
average BP EXIT function equals A(l — p(l — x)). If e(x) is 
monotonically increasing in x over the whole range [0, 1], then 
the BP EXIT curve is given in parametric form by 



(e(x).A(l-p(l -*))). 



(6) 



For some ensembles (e.g., regular cycle-code ensembles) e(x) 
is indeed monotone increasing over the whole range [0,1], 
but for most ensembles this is not true. In this case we have 




X 




(a) 



(b) 



Fig. 5. BP EXIT curve with two discontinuities (J=2): (a) Channel entropy 
function x i— > e(x) (b) BP EXIT function e t— > h BP (e). This example 
corresponds to the dd pair (A,p) = (0.3a: + 0.3a; 2 + 0.4a; 13 , a; 6 ), which 
has design rate r « 0.48718. The BP threshold is e BP rj 0.48437 at 
x BP Ri 0.09904. This is also the first discontinuity, i.e., e 1 = e BP , x 1 = x BP 
and x 1 as 0.22156. The second discontinuity occurs for e = e 2 0.51553 
at x 2 as 0.37016 (x 1 = 1). 



to restrict the above parameterization to the unique union of 
intervals 

1= (J [*^)u{i}, 

ie[J] 

which has the property that e(x) is continuously and mono- 
tonically increasing from e BP to one as x takes on increasing 
values in X and for all i G [J], x l = or e'(x l ) = 0. An 
example of such a partition is shown in Fig. [5] That such a 
partition exists and is unique follows from the fact that e(x) 
is a differentiable function for x G [0, 1] as can be verified by 
direct computation. Set x J = 1 and note that e(l) = 1 > 0. 
Define x 3 as the largest nonnegative value of x < x J for which 
e'(x) = 0. If no such value exists then e(x) is monotonically 
increasing over the whole range [0, 1]. In this case J = 1 and 
we set x J = 0. Now proceed recursively. Assume that the 
intervals [x l+1 ,x l+1 ) have been defined and that x l+1 > 0. 
Define x l as the largest nonnegative value of x < x i+1 such 
that e(x) = e{x l+1 ). Note that if such a value exists then 
we must have e'(x) > 0. If no such value exists then we 
have already found the sought after partition and we stop. 
Otherwise define x l as the largest nonnegative value of x < x 1 
for which e'(x) = 0. As before, if no such value exists then 
set x* = and stop. Without loss we can eliminate from the 
resulting partition any interval of zero length. Let J denote 
the number of remaining intervals of nonzero length. Note, if 
the BP threshold happens at a discontinuous phase transition 
(jump), then x BP = x 1 and e BP = e(x 1 ), otherwise, if the BP 

x° = 



and 

1 4 



threshold is given by the stability condition, then x BP = x u 
and e BP = e(x°). See also Fig. GO 

Corollary 1: Assume we are given a dd pair (A, p) 
that transmission takes place over the BEC. Let 
Uie[j] [2£ l > x l )U{l} be the partition associated to (A, p). Define 
e BP = e(x}). Then the BP EXIT function h BP (e) is equal to 
zero for < e < e BP and for e > e BP it has the parametric 
characterization 

(e(x),A(l-p(l-x))), 
where x takes on all values in X. 
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Fact 1 (Regular LDPC Ensembles "Jump" at Most Once): 
Consider the regular dd pair (X(x), p(x)) = x r_1 ). 
Then the function e(x) = x(i-p(i-x)) has a unique minimum 
in the range [0,1]. Let x BP denote the location of this 
minimum. Then e(x) is strictly decreasing on (0,x BP ) and 
strictly increasing on (x BP , 1). Moreover, x BP = if and only 
if 1 = 2. 

Proof: Note that e(l) = 1 and by direct calculation we 
see that e'(l) = 1. Therefore, either e(x) takes on its minimum 
value within the interval [0, 1] for x = or its minimum value 
is in the interior of the region [0, 1]. Computing explicitly the 
derivative of e(x), we see that the location of the minima of 
e(x) must be a root of W(x) = 1 - (1 - x) 1 ' 1 - (l - l)(r - 

1) (l-x) r - 2 x. Furthermore W'(x) = -(r- l)(l-x) r - 3 {(l- 

2) - [(1 - l)(r - 1) - l]x}. Notice that W{0) = 0, W'(0) = 
-(r— l)(l— 2) < and W(l) = 1. By the Intermediate Value 
Theorem, W(x) vanishes at least once in (0, 1). Suppose now 
that W(x) vanishes more than once in (0,1), and consider 
the first two such zeros xi, X2. It follows that W'{x) must 
vanish at least twice: once in (0,xi) and once in (xi,X2). On 
the other end, the above explicit expression implies that W^'(x) 
vanishes just once in (0, 1), at x = (l — 2)/[(l — l)(r — 1) — 1]. 
Therefore W(x) has exactly one root in (0, 1). See also [25]. 

■ 

A dynamic interpretation of the convergence of the BP de- 
coding when the number of iterations t — > oo is shown in 
Appendix IIVI using component EXIT curves. It is further 
shown in Appendix Hn] and Theorem \^\ how to compute the 
area under the BP EXIT curve. The calculations show that 
this area is always larger or equal the design rate. Moreover, 
some calculus reveals that, whenever the BP EXIT function 
has discontinuities, then the area is strictly larger than the 
design rate r. 

C. Extended BP EXIT Curve 



1 








Fig. 6. EBP EXIT function {(e(x), A(y(x)))} 1 . 

Surprisingly, we can apply the Generalized Area Theo- 
rem also to BP decoding if we consider the Extended BP 
EXIT (EBP) curve. Fig. g] shows this EBP EXIT curve for 
the running example, i.e., for the dd pair (x 2 , y 5 ). We will see 
shortly that this EBP EXIT curve plays a central role in our 
investigation. First, let us give its formal definition. 

Definition 3: Assume we are given a dd pair (A, p). The 
EBP EXIT curve, denote it by /i EBP , is given in parametric 



form by 

(e,/0 = (e(x),A(l-p(l-x))), 

where e(x) = A(1 _ p x (1 „ x)) and x € [0, 1]. 

Theorem 8 (Area Theorem for EBP Decoding): Assume 
we are given a dd pair (A, p) of design rate r. Then the EBP 
EXIT curve satisfies 

f h EBP (x)de{x) = r. 
Jo 

Proof: We will give two proofs of this fact. 



root 




leaves 

Fig. 7. Graph of a small tree code: computation tree of depth one for the 
regular (2,4) LDPC ensemble. 

(i) The first proof applies only if e(x) < 1 for x e (0, 1]. 
This in turn happens only if A'(0) > 0, i.e., if the ensemble 
has a non-trivial stability condition. We use the (General) Area 
Theorem for transmission over binary erasure channels where 
we allow the parameter of the channel to vary as a function 
of the bit position. First, let us assume that the ensemble is 
(1, r)-regular. Consider a variable node and the corresponding 
computation tree of depth one as shown in Fig. Let us 
further define two channel families. The first is the family 
{BEC(x)} x=0 . The second one is the family 5 {BEC(e(x))} x=0 
where e(x) = x(i-p(i-x)) ^ m The two families are parametrized 
by a common parameter x which is the fixed-point of density 
evolution: they are smooth since e(x) is differentiable with 
respect to x. Let us now assume that the bit associated to 
the root node is passed through a channel BEC(e(x)), while 
the ones associated to the leaf nodes are passed through a 
channel BEC(x). We can apply the General Area Theorem: 
let X = (Xx, . . . , -Xi+ix(r-i)) be the transmitted codeword 
chosen uniformly at random from the tree code and Y(x) be 
the result of passing X through the respective erasure channels 
parameterized by the common parameter x. The General Area 
Theorem states that H(X | F(x = 1)) - H(X | Y(x = 0)) = 
H(X) is equal to the sum of the integrals of the individual 
EXIT curves, where the integral extends from x = to x = 1. 
There are two types of individual EXIT curves, namely the one 
associated to the root node, call it h Ioot (x) and the l(r— 1) ones 
associated to the leaf nodes, call them /ii ea f(x). To summarize, 
the General Area Theorem states 

H(X)= [ /i root (x)de(x) + l(r-l) f h leaf (x)dx. 
Jo Jo 

Note that H (X) = 1 + l(r - 1) - 1 = 1 - l(r - 2) since the 
computation tree contains 1 + l(r — 1) variable nodes and 1 
check nodes. Moreover, J Q /i] ea f(x)dx = J Q 1 — p(l — x)dx = 

5 Recall that < e(x) < 1 for all x G [0, 1] by assumption. 
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(r — l)/r since the message flowing from the root node to 
the check nodes is erased with probability x (Recall that 
x = e(x)A(l - p(l - x)), where (A(x), p(x)) = (x 1 " 1 , x 1 " 1 ). 
Moreover, observe that the result could also be obtained 
by applying the Area Theorem locally to the Single-Parity- 
Check code). Collecting these observations and solving for 
J h mot (x)de(x), we get 

/ h mot (x)de(x) = 1 - 1/r = r, 
Jo 

as claimed since h loot = /i EBP . The irregular case follows in 
the same manner: we consider the ensemble of computation 
trees of depth one where the degree of the root note is 
chosen according to the node degree distribution A(x) and 
each edge emanating from this root node is connected to a 
check node whose degree is chosen according to the edge 
degree distribution p(x). As before, leaf nodes experience 
the channel BEC(x), whereas the root node experiences the 
channel BEC(e(x)). We apply the General Area Theorem to 
each such choice and average with the respective probabilities. 

(ii) The second proof applies in all cases. Applying inte- 
gration by parts twice we can write 

^ ^(x)de(x) = / l EBP (x)e(x)|; =0 -y o -^±e{x)Ax 

( = } 1 - A'(l) / xp'(l - x)dx 
Jo 

i (xp(l -x)\l =0 +JqP(1 -x)dx) 

Jo A(x)dx 
= l-A'(l)/r'(l)=r, 

where (a) follows since h EBP (x) = A'(l) / X_/,(1_x) A(x)dx 
and A'(l) = 1/ L A. Similar computations will be performed 
several times throughout this paper. In this respect it is handy 
to be able to refer to two basic facts related to this integration 
which are summarized as Lemma ^] and Lemma \15\ in 
Appendix IIII- Al ■ 

IV. An Upper-Bound for the Maximum A 
Posteriori Threshold 

Assume that transmission takes places over BEC(e). Given 
a dd pair (A, p), we trivially have the relations 

e BP < e MAP < min{e sh ,e Stab }, (7) 

where e sh and e Stab denote, respectively, the Shannon and 
stability threshold. As we have discussed, it is straightforward 
to compute e BP by means of DE and e BP < e MAP follows from the 
sub-optimality of BP decoding. The inequality e MAP < e Sh = 
1 — r is a rephrasing of the Channel Coding Theorem. Finally 
£ map < £ stab = i/(A'(0)p'(l)) can be proved through the 
following graph-theoretic argument. Assume, by contradiction 
that e MAP > e Stab and let e be such that e Stab < e < e MAP . Notice 
that e Stab < e is equivalent to eA'(0)//(l) > 1. Consider now 
the residual Tanner graph once the received variable nodes 
have been pruned, and focus on the subgraph of degree 2 
variable nodes. Such a Tanner graph can be identified with an 



ordinary graph by mapping the check nodes to vertices and 
the variable nodes to edges. The average degree of such a 
graph is e\'(0)p'(l) > 1 and therefore a finite fraction of its 
vertices belong to loops [26]. If a bit belongs to such a loop, 
it is not determined by the received message: in particular 
E[Xj|y] = 1/2. In fact, there exist a codeword such that 
xi = 1: just set Xj = 1 if j belongs to some fixed loop 
through i and otherwise. Since there is a finite fraction 
of such vertices h(e) > (if the limit exist) and therefore 
e > e MAP . We reached a contradiction, therefore e MAP < e Stab as 
claimed. 

While e Stab and e sh are simple quantities, the threshold 
£ map j s no j as eaS y j Q com p U t e i n m j s section we will 

prove an upper-bound on e MAP in terms of the (extended) BP 
EXIT curve. In the next sections, we will see that in fact this 
bound is tight for a large class of ensembles. The key to this 
bound is to associate the Area Theorem with the following 
intuitive inequality. 

Lemma 1: Consider a dd pair (A, p) and the associated 
EXIT functions /i BP and /i MAP . Then /i MAP < h w . 

Proof: Note that Lemma ^ expresses the natural state- 
ment that BP processing is in general suboptimal. For a 
given length n, pick a code at random from LDPC(A, p, n). 
Call <£> BP the extrinsic BP estimate of bit i and note that 
$ BP = $ BP (K^), i.e., the extrinsic BP estimate is a well 
defined function of Y^i. The Data Processing Theorem asserts 
that H(Xi\Y„i) < H{Xi\$Y(Y„i)). This is true for all codes 
in LDPC(A, p, n). Therefore taking first the average over the 
ensemble and second the limit when the blocklength n — ► oo 
(assuming the limit of the MAP EXIT function exists), we get 
/i MAP (e) < h Be {e). M 

Because of Lemma Q it is of course not surprising that 
the integral under h BP is larger or equal than the asymptotic 
rate of the code r as as pointed out in Section UlI-BI In most 
of the cases encountered in practice, r = r as , (see Section 
W\ . the area under the MAP EXIT curve is therefore r and 
the area under the BP EXIT curve is strictly larger than r if 
and only if the curve exhibits discontinuities (in the absence 
of discontinuities, the two curves coincide and the MAP/BP 
threshold is given by the stability condition). 

Example Q refines and illustrates this observation by show- 
ing that the BP and MAP threshold might be equal even if 
their respective EXIT functions are not pointwise equal. 

Example 1: Consider the dd pair (A, p) = (0.4a; + 
0.6a; 6 , x 6 ) and the corresponding LDPC ensemble with design 
rate r = 0.5. Using a weight enumerator function, see, e.g., 
Section [V] one can show that r = r as = J h" Ae . A quick 
look shows that the BP threshold is given by the stability 
condition, i.e., it is e BP w 0.4167 obtained for x w x° = 0. 
When the parameter is x° « 0.04828, i.e., at e 1 « 0.4691, 
a discontinuity of the BP EXIT curve appears and the edge 
erasure probability x "jumps" to x 1 ~ 0.3309. This situation is 
shown in Fig. [8] Since the BP threshold is determined by the 
stability condition, as explained previously we have e BP = e MAP . 
This is true despite the fact that the integral under the BP 
EXIT is larger than r = r as ! 

Recall that the Area Theorem asserts that L /i MAP (e)de = 
r as , where r as is the asymptotic rate of the ensemble defined 
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e(x) 



(a) 



(b) 



Fig. 8. BP EXIT entropy curve with 1 discontinuity (J=l) for which the BP 
threshold e BP = e MAP is given by the stability condition: (a) Channel entropy 
function x >-> e(x) (b) BP EXIT function e i-+ h w (e). 



in Theorem |7] By definition fr MAP (e) = for e < e MAP . 
Therefore we have in fact J e L A p ft. MAP (e)de = r as . Now note 
that the BP decoder is in general suboptimal so that /i MAP (e) < 
/i BP (e). Further, in general r as > r(X,p). Combining these 
statements we see that if e MAP is a real number in [e BP , 1] such 
that f_Lr. h Be {e)de = r(X, p) then /_!> / l « AP ( e )de < r as . We 
conclude that for such a e MAP , e MAP < e MAP . Let us summarize a 
slightly strengthened version of this observation as a lemma. 

Lemma 2 (First Upper Bound on e MAP ): Assume we are 
given a dd pair (X,p). Let /i BP (e) denote the associated BP 
EXIT function and let e MAP be the unique real number in 
[e BP ,l] such that /_L AP h™(e)de = r(X,p). Then e MAP < e MAP . 



If in addition e v 



then e y 



e BP , and in fact 



/i MAP ( e ) = h w {e) for all e € [0,1]. 

Proof: We have already discussed the first part of the 
lemma. To see the second part, if e MAP = e BP then by (0 we 
have a lower and an upper bound that match and therefore we 
have equality. This can only happen if the two EXIT functions 
are in fact identical (and if r as = r(A, p)). ■ 

Example 2: For the dd pair (X(x),p(x)) = (x,x 3 ), we 
obtain e MAP = 1/3 = e BP . Therefore, for this case the MAP 
EXIT function is equal to the BP EXIT function and in 
particular both decoders have equal thresholds. 

Example 3: For the dd pair (A(x), p(x)) = (x 2 ,x 3 ), we 
obtain e MAP = 102 ^f* 1 « 0.647426. Note that this dd pair has 
rate 1/4 so that this upper bound on the threshold should be 
compared to the Shannon limit 3/4 = 0.75. 

Example 4: For the dd pair (X(x),p(x)) = 
(x 2 , x 5 ) of our running example, we get 

7- , /~2+a-b+ 4 

—MAP _ V V-l-a + b 



-1-r- 



/-l-a+6 



with a = ^ — r and b = (55 + 30 \/5l) 3 . Numerically, 

(11+6V5T) 3 

-map _ 0.4881508841915644. The Shannon threshold for this 
ensemble is 0.5. 

For a dd pair which exhibits a single jump the computation of 
this upper bound is made somewhat easier by the following 
lemma. Note that by Fact[^this lemma is applicable to regular 
ensembles. 

Lemma 3: Assume we are given a dd pair (A, p). Define 



the polynomial y(x) = 1 — p(l — x) and, for x 6 (0, 1] the 
function e(x) = x(y{yi)) • Assume that e(x) is increasing over 
[x BP , 1]. Let x* be the unique root of the polynomial 

P(x)4A'(l)x(l-y(x))-^l[l-r(l-x)]+e(x)A(y(x)), 

in the interval [x BP , 1]. Then e MAP = e(x*). 

Proof: Recall that if e(x) is increasing over [x BP , 1] then 
we have the parametric representation of /i BP (e) as given in 
l|6}. Using Lemmas and ^] we can express the inte- 
gral Jjmap /i BP (e)de as a function of e MAP . More precisely, we 
parametrize e MAP by x and express the integral as a function 
of x. Equating the result to r(X,p) = 1 - A'(l)/r'(l) and 
solving for x leads to the polynomial condition P(x) = 
stated above. ■ 
Example 5: The following table compares the thresholds 
and bounds for various ensembles. Hereby A' 1 ' (a;) = x, 

\(2)/ T \ _ 7x 2 +2x a + lx i \(3)/™\ _ 2857t+3061.47z 2 +4081.533: 9 

y d ') 10 * \ 1 ~ 10000 

A (4) (a , ) = 7.71429^+2.28571.^ A< 5 ) (x) = ^^L. 

The threshold of the first ensemble is given by the sta- 
bility condition. Its exact value is 7/28 « 0.1786. 



X(x) 



2x 5 +3x 6 



e 6 " 


—MAP 


£ MAP 


e sh 


0.1786 


0.1786 


0.1786 


0.3048 


0.4236 


0.4948 


0.4948 


0.5024 


0.4804 


0.4935 


0.4935 


0.5000 


0.5955 


0.6979 


0.6979 


0.7000 


0.3440 


0.3899 


0.3899 


0.4000 



XW(x) 
X^(x) 
X^(x) 



The polynomial -P(x) provides in fact a fundamental char- 
acterization of the MAP threshold and has some important 
properties. These are more conveniently stated in terms of a 
slightly more general concept. 

Definition 4: The trial entropy for the channel BEC(e) 
associated to the dd pair (A, p) is the bi-variate polynomial 

A'(l), 



P £ (x,y) = A'(l)x(l-y) 



F(l 



-[l-r(l-x)]+eA(y). 



A few properties of the trial entropy are listed in the following. 

Lemma 4: Let (A, p) be a dd pair and P e (x, y) the corre- 
sponding trial entropy. Consider furthermore the DE equations 
for the ensemble x t+ i = eA(yt), yt+i = 1 — p(l — x t ), t 
being the iteration number. Then (in what follows we always 
consider x, y € [0, 1]) 

1) The fixed points of density evolution are stationary 
points of the trial entropy. Vice versa, any stationary 
point of the trial entropy is a fixed point of density 
evolution. 

2) P(x)=P eW (x,y(x)). 

3) P(x = l) = P e=1 (x = l,y=l)=r(A,p). 

4) Let a 4 ( e „ = e(x a ), /i EBP (x a )) and b = (e b = 
e(xb), /i EBP (xf,)) be two points on the EBP EXIT curve 
(with x a/b G (0, 1]) and define y a/b = 1 - p(l - x a/b ). 
Then 

f& 

/ l EBP (e(x)) de(x) = P £b (x b , y b ) - P €a (x a , y„) . 

Proof: O is proved by explicitly computing the partial 
derivatives of P e (x, y) with respect to x and y: <9 x P £ (x,y) = 
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A'(l)[l - y - p(l - x)], d y P(x,y) = A'(l)[-x + eA(y)]. 
Since A'(l) > 0, the stationarity conditions d x P e (x,y) = 
and <9 y P e (x, y) = are equivalent to the fixed point conditions 
for DE. (|2j and are elementary algebra. In order to prove 
(|4}, notice that we have <9 x P e (x,y) = 9 y P e (x,y) = at any 
point (x, y(x), e(x)) along the EBP EXIT curve. This follows 
from the fact that points on the EBP EXIT curve are fixed 
points of density evolution. Therefore 

^P e(x) (x,y(x)) = A(y(x)) g(x) = ^(e(x)) g(x) . 

The thesis follows by integrating over x. Equivalently, we 
could have used again Lemmas and [H] ■ 

Unfortunately, the upper-bound stated in Lemma |2] is not 
always tight. In particular, this can happen if the EBP 
EXIT curve exhibits multiple jumps (i.e., if e(x) has more 
than one local maximum in the interval (0, 1]). We will state 
a precise sufficient condition for tightness in the next section. 
An improved upper bound is obtained as follows. 

Theorem 9 (Improved Upper-Bound on e MAP ): Assume we 
are given a dd pair (A,p). Let h EBf (e) denote the associated 
EBP EXIT function and let (e MAP = e(x*), /i EBP (x*)) be a point 
on this curve. Assume that J „ ft, EBP (x)de(x) = r(A, p) and 
that there exist no x' G (x*, 1] such that e(x') = e(x*). Then 

^MAP <- —MAP 

The proof of this theorem will be given in Section fVll using 
the so-called Maxwell construction. Notice that in general 
there can be more than one value of e satisfying the theorem 
hypotheses. We shall always use the symbol e MAP to refer to 
the smallest such value. On the other hand, it is a consequence 
of the proof of theorem that there always exists at least one 
such value. 

As before, the following lemma simplifies the computation 
of the upper bound by stating the following more explicit 
characterization. 

Lemma 5: Consider a dd pair (A, p). Let x* G (0, 1] be a 
root of the polynomial P(x) defined in (|3j, such that there 
exist no x' G (x*, 1] with e(x') = e(x*). Then e MAP < e(x*), 
and e MAP is the smallest among such upper bounds. 

Proof: Let x* be defined as in the statement. Then, by 
Lemma |3 points <EJ, (01 and ©: 

J h EBP {x) de(x) = P(l) - P(x*) = r(A, p) - P(x*) . 

Therefore, j\ /i EBP (x) de(x) = r(A, p) if and only if P(x*) = 
0. ■ 
For a large family of dd pairs the upper bound stated in 
Theorem [9] is indeed tight. Nevertheless, it is possible to 
construct examples where we can not evaluate the bound at all 
roots x* of P(x) since for some of those roots there exists a 
point x' G (x*, 1] with e(x') = e(x*). In these cases we expect 
the bound not to be tight. Indeed, we conjecture that the extra 
condition on the roots of P(x) are not necessary and that the 
MAP threshold is in general given by the following statement. 

Conjecture 1: Consider a degree distribution pair (A, p) and 
the associated polynomial P(x) defined as in l|3}- Let X C 
(0, 1] be the set of positive roots of P(x) in the interval (0, 1] 



(since P(x) is a polynomial, X is finite). Equivalently, X is 

the set of x* G (0, 1] such that h EBP (x) de(x) = r(A, p). 
Then e MAP = min{e(x*); x G X}. 

V. Counting Argument 

We will now describe a counting argument which yields an 
alternative proof of Lemma|2] More interestingly, the argument 
can be strengthened to obtain an easy-to-evaluate sufficient 
condition for tightness of the upper-bound. 

The basic idea is quite simple. Recall that we define the 
MAP threshold as the maximum of all channel parameters for 
which the normalized conditional entropy converges to zero 
as the block length tends to infinity. For the binary erasure 
channel, the conditional entropy is equal to the logarithm 
of the number of codewords which are compatible with the 
received word. Therefore, a first naive way of upper bounding 
the MAP threshold consists in lower bounding the expected 
number of codewords in the residual graph, after eliminating 
the received variables. If, for a given channel parameter, this 
lower bound is exponential with a strictly positive exponent, 
then the corresponding conditional entropy is strictly positive 
and we are operating above the threshold. It turns out that 
a much better result is obtained by considering the residual 
graph after iterative decoding has been applied. In fact, this 
simple modification allows one to obtain matching upper and 
lower bounds in a large number of cases. 

Let G be chosen uniformly at random from the ensemble 
characterized by 3 = (A, T). Assume further that transmission 
takes place over BEC(e) and that a BP decoder is applied to 
the received sequence. Denote by G(e) the residual graph after 
decoding has halted, and by S G( - e ) = (A G ( e ), r G ( e )) its degree 
profile (i.e., the fraction of nodes of any given degree). We 
adopt here the convention of normalizing the dd pair of G(e) 
with respect to the number of variable nodes and check nodes 
in the original graph. Therefore, A G ( e ) (1) < 1 is the number of 
variable nodes in G(e) divided by n. Analogously, T G ( e ) (1) < 1 
is the number of check nodes in G (e) divided by nA' ( 1 ) /V ( 1 ) . 

It is shown in [16] that, conditioned on the degree profile 
of the residual graph, G(e) is uniformly distributed. The 
dd pair S G ( e ) itself is of course a random quantity because of 
the channel randomness. However, it is is sharply concentrated 
around its expected value. For increasing blocklengths this 
expected value converges to S c = (A c ,r e ), which is given 
by 6 

A e (z) = eA(zy), (8) 

r £ (z) = r(i - x + zx) - r(i - x) - zxr'(i - x) . (9) 

Here, x and y denote the fraction of erased messages at the 
fixed point of the BP decoder. More precisely, x G [0, 1] is the 
largest solution of x = eA(l — p(l — x)) and y = 1 — p(l — x). 
The precise concentration statement follows. 

6 The standard dd pair from the node perspective of the residual graph 
when transmission takes place over BEC(e) is then simply given by 

(' A.M r.(x) \ 
\A e {i)< r c (i))- 
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Lemma 6: Let e £ (0, 1] be a continuity point of x(e) (we 
shall call such an e non-exceptional). Then, for any £ > 0, 



0. 



(10) 



(11) 



lim Pr{d(Ho (e)l S £ ) >f> 

n — >oo 

Here, •) denotes the L\ distance 

The proof is deferred to Appendix Hll 

Under the zero-codeword assumption, the set of codewords 
compatible with the received bits coincides with the set of 
codewords of the residual graph. Their expected number can 
be computed through standard combinatorial tools. The key 
idea here is that, under suitable conditions on the dd pair (of 
the residual graph), the actual rate of codes from the (residual) 
ensemble is close to the design rate. We state here a slightly 
strengthened version of this result from [27]. 

Lemma 7: Let G be chosen uniformly at random from the 
ensemble LDPC(?i, S) =LDPC(n, A, T), let r G be its rate and 
r = 1 — A'(l)/r'(l) be the design rate. Consider the function 
*s(u), 

(1 + uv) 



tf H M=-A'(l)log 2 



A l l0 §2 

1 



(1 + 


«)(! + 


V) 


1 


fu 1 




[2(1 


+ u)\ 





A'(l) 
F(l) 



£r r io 







1 + 


[l + v) ] 







Ai 



Aim 1 



(12) 



(13) 



Assume that v &e(u) takes on its global maximum in the range 
u £ [0, 00) at u = 1. Then there exists B > such that, for 

any £ > 0, and 71 > no(£, S), 

Pr{|r G -r(A,r)| > £} < e~ s "« . 
Moreover, there exist C > such that, for n > n (£,£), 

E[|r G -r(A,r)|] <C^!i. 

n 

Proof: The idea of the proof is the following. For any 
parity-check ensemble we have r G > r(A, T). If it is true that 
the expected value of the rate (more precisely, the logarithm 
of the expected number of codewords divided by the length) is 
close to the design rate, then we can use the Markov inequality 
to show that most codes have rate close to the design rate. 

Let us start by computing the exponent of the expected 
number of codewords. We know from [27]-[36] that the 
expected number of codewords involving E edges is given 
by 



E[N G (E)} = 



coef |rii(l + u r ) nM n r qr(v) n $» Tl ,u E v E ^ 



l A'(l)\ 
E ) 



where q T (v) = ((1 + v) r + (1 - v) r )/2. Let n tend to 
infinity and define e = E/(nA'(l), From standard arguments 
presented in the cited papers it is known that, for a fixed e, 



the exponent lim„_ > . c)0 — log 2 (E[AT G (enA'(l))]) is given by the 
infimum with respect to u, v > of 

Y, Ai log 2 (l W)-A'(l)e log 2 u+P±t Y Tr log 2 q r (v) 

1 ' r 

-A'(l)elog 2 «-A'(l)/i(e). (14) 

We want to determine the exponent corresponding to the 
expected number of codewords, i.e., linin^oo ^ log 2 (E[A , G ]), 
where N G = J^e Ng(E). Since there is only a polynomial 
number of "types" (numbers E) this exponent is equal to the 
supremum of (114-1 over all < e < 1. In summary, the sought 
after exponent is given by a stationary point of the function 
stated in (114-1 with respect to u, v and e. 

Take the derivative with respect to e. This gives e = uv/(l+ 
uv). If we substitute this expression for e into (114-1 . subtract 
the design rate r(A, V), and rearrange the terms somewhat we 
get G3- Next, if we take the derivative with respect to u and 
solve for v we get get Jl 3b . In summary, $3(11) is a function 
so that 



log 2 E[^ G ] =n{r(A,T) 



sup "J- (it) 

uS[0,oo) 



Jn} ■ 



where u) n — o(l). In particular, by explicit computation we 
see that 4 , h(m = 1) = 0. A closer look shows that u = 
1 corresponds to the exponent of codewords of weight n/2. 
Therefore, the condition that the global maximum of ^s(u) 
is achieved at u = 1 is equivalent to the condition that the 
expected weight enumerator is dominated by codewords of 
weight (close to) n/2. Therefore, 

Pr{r G > r(A, r) + £} = Pr { A^ G > 2 n ^-^E[N G ] } 



< e 



-Bn(; 



where the step follows from the Markov inequality if B = 
(log2)/2 and uj n < £/2 for any n > uq. 
Finally, we observe that, since r G < 1 

E[|r G -r(A,T)|] <£ + e - s <, 

and the second claim follows by choosing £ = logn/Bn. ■ 
We would like to apply this result to the residual graph 

G(e). Since the degree profile of G(e) is a random variable, 

we need a preliminary observation on the "robustness" of the 

hypotheses in the Lemma 

Lemma 8: Let 'I'h(-) be defined as in Lemma Then 

^b(u) achieves its maximum over u £ [0, +00) in [0, 1]. 
Moreover, there exists a constant A > such that, for any 

two degree distribution pairs 5 = (A, T) and E = (A, f), and 

any u £ [0, 1], 

|*s(«) ~ *§( u )l ^ Ad ( E ' (1 - uf . (15) 
For the proof we refer to Appendix ITU 

We turn now to the main result of this section. 

Theorem 10: Let G be a code picked uniformly at random 
from the ensemble LDPC(n,A,r) and let H G (X\Y) be the 
conditional entropy of the transmitted message when the code 
is used for communicating over BEC(e). Denote by P e (x,y) 
the corresponding trial entropy. Let S e = (A e ,r c ) be the 
typical degree distribution pair of the residual graph, see 
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Eqs. 0, and (x) be defined as in Lemma Q Eq. 

Assume that ^s e (u) achieves its global maximum as a 
function of u G [0,oo) at u = 1, with ^ (1) < 0, and 
that e is non-exceptional. Then 



Now we can apply Lemma to get 



lim -E[H G (X\Y)] = P e (x,y), 

n — >oo 77, 



(16) 



where x G [0, 1] is the largest solution of x = eA(l — p(l — x)) 
and y = 1 — p(l — x). 

Proof: As above, we denote by G(e) the residual graph 
after BP decoding and by r G ( e ) its rate normalized to the orig- 
inal blocklength n. Notice that Hq(X\Y) = nr G ( e y. iterative 
decoding does not exclude any codeword compatible with the 
received bits. Furthermore, the design rate (always normalized 
to n) for the dd pair of the residual graph is 

KHG(e)) = AG(e)(l)-^|r G(e) (l). 

We further introduce the notation r e for the design rate of the 
typical dd pair of the residual graph. Using Eqs. (|8j and l|9}, 
we can find 

r t = A'(l)p(l - x)x- ^[1 - T(l - x)] + eA(y) 
= Pe(x,y), 

where the last step follows from the fixed-point condition y = 
l-p(l-x). 

Since by assumption (u) achieves its global maximum 
at u = 1, with *S £ (1) < 0, and *s t (l) = 0, there exists 
a positive constant 8 such that f 2 t (u) < — 6(1 — u) 2 for 
any u G [0,1]. As a consequence of Lemma [S] there exist 
a £ > such that, for any dd pair 5, with d(S, S £ ) < £, 
#s(V) < -8(1 - u) 2 /2 for u G [0, 1]. 

Let Pr e (S) be the probability that the degree distribution 
pair of the residual graph G(e) is H = (A, T). Denote by E 
expectation with respect to a uniformly random code in the 
(n,A,f) ensemble (here h = nA(l)). Denote by N(Ci the 
set of dd pairs S, such that d(S,S c ) < £. The above remarks 
imply that we can apply Lemma to any ensemble in N(£). 
Then 



±E[H a (X\Y)}=J^Pv e (E)E[r G(e) ] 



= Pr £ (S)E[r G(e) ]+ W (n,e). 

The remainder can be estimated by noticing that r G ( e ) < 1 
while the probability of S ^ A/"(e) is bounded by Lemma [6] 
Therefore 

lim u>(n, £) = . 



-E[H G (X\Y)} 



< ^Pr e (3)|E[r G(e) ] 
Pr e (S) \r(E) 
<£Pr e (H)|r(5)- ; 



r(H)| 

r e | +oj(n,£) 
+ w'(n,0, 



where a/(n, £) = £) + C log n/n. Notice that there exist 
> such that for any pair Si, E2 



-(Hx)-r(S 2 )| <Bd(S i; Sa) 



Therefore, 



lim 

n — >oo 



-E[if G (X|F)] 



<#£. 



The claim follows by noticing that £ can be chosen arbitrarily 
small. ■ 

Theorem [TO] allows to compute the exact MAP threshold 
whenever the required conditions are verified. An explicit 
characterization is given below. 

Corollary 2: Consider transmission over BEC(e) using ele- 
ments picked uniformly at random from the ensemble (A, T). 
Let x*,y* > be the DE fixed-point achieved by the BP 
decoder at a non-exceptional erasure probability e* (i.e., x* G 
(0,1] is the largest solution of x* = e*A(l - p(l — x*))). 
Assume that P e * (x*,y*) = and that *s e » (u) < for 
u G [0, +00) together with ^-^(1) < 0. Let W C [0, +00) 
be the set of points u^l such that $>s c , (u) = 0. If, for any 
u e W, 9 £ *h £ , (u) < d £ *H £ , (1), then e MAP = e*. 

Proof: We claim that there exist a 8 > such that the 
hypothesis of Theorem llOl are verified for any e G (e*, e* +5). 
Before proving this claim, let us show that it implies the thesis. 
Consider any e G (e*, e* +8) and let x, y be the corresponding 
density evolution fixed point. Then 

lim -E[H(X\Y)} = P e (x(e), y(e)) Ve G (e*,e* + 6) . 

n — >oo Tl 

Moreover P e * (x(e*), y(e*)) = by hypothesis and 



-P e (x(e),y(e)) 



A(y(e))>0. 



Therefore P e (x(e), y(e)) > for any e > e*. This implies 
£ map <- £ * Q n tne omer hand E[i/(A|F)] is strictly increasing 
with e. This implies 

1, 



lim -E[H(X\Y)] = 0, 

n— >oo fi 



Ve G [0, 



which in turn implies e MAP > e* and, therefore, e MAP = e*. 

Let us now prove the claim. By assumption e* is non- 
exceptional and therefore the residual dd pair E e is continuous 
at e*. This implies, via Lemma [8] that, for any £ > 0, there 
exist 8 such that for e G [e*, e* + (5) and any u G [0, 1], 

|*H £ (w)-*H e , <£(l-u) 2 . 

Together with (1) < 0, this implies that, if 8 is small 
enough, u = 1 is a local maximum of ^h^w). It follows 
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e 
Fig. 9. (E)BP EXIT function /i EBP (e). 




e MAP e v 

Fig. 11. (E)BP EXIT function /i EBP (e). 



from the hypotheses on 9 e ^H e . (w), it <E W, that it is also a 
global maximum. ■ 

The conditions in the above corollary are relatively easy to 
verify. Let us demonstrate this by means of two examples. 

Example 6 (Ensemble LDPCfx 2 , x 5 )): Consider the (3,6)- 
regular LDPC ensemble. For convenience of the reader its EBP 
EXIT curve is repeated in Fig. [9] 

Let us apply Theorem EH We start with €a = 1 (point 
A). The residual degree distribution at this point corresponds 
of course to the (3, 6)-ensemble itself. As shown in the left- 
most picture in Fig. ^3 the corresponding function ^s(u) 
has only a single maximum at u = 1 and one can verify 
that ^3(1) < 0. Therefore, by Lemma Q we know that with 
high probability the rate of a randomly chosen element from 
this ensemble is close to the design rate. Next, consider the 
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Fig. 10. Function ^3(11) for the dd pair formed by the residual ensemble 
in A, B and C. 

point eb = 0.52 (point B). Again, the conditions are verified, 
and therefore the conditional entropy at this point is given 
by equation OJ. We get H(X | Y(e B )) ~ 0.02755. Finally, 
consider the "critical" point ec ~ 0.48815. As one can see 
from the right-most picture in Fig. [TO] this is the point at 
which a second global maximum appears. Just to the right 
of the point the conditions of Theorem ^| are still fulfilled, 
whereas to the left of it they are violated. Further, at this 
point Eq. lO states that H(X | Y{ec)) = 0. We conclude that 
£ map _ €c 0.48815, confirming our result from Example |4] 
Since the bound is tight at the MAP threshold it follows that 
/i MAP = h BP for all points "to the right" of the MAP threshold 
(this is true since /i MAP < /i BP always, and the tightness of 
the bound at the MAP threshold shows that the area under 
h w is exactly equal to the rate). We see that in this simple 
case Theorem [K)] allows us to construct the complete MAP 
EXIT curve. 

Example 7 (Ensemble LDPQ 3x + 3x ^+ 4x13 ; x 6)) : Consider 
the ensemble described in Fig. [3] Its EPB EXIT curve is 
repeated for the convenience of the reader in Fig. \^\ The 



corresponding BP EXIT curve is shown in detail in Fig. [5] A 
further discussion of this ensemble can be found in Example 
[Tol Let us again apply Theorem [lO] We start with ea = 1 




Fig. 12. Function ^-(m) for the dd pair formed by the residual ensemble 
in A, B, C, E, F and G. 

(point A). The residual degree distribution corresponds of 
course to the ensemble itself. As the top left-most picture in 
Fig. ^3 shows, the hypotheses are fulfilled and we conclude 
again that with high probability the rate of a randomly 
chosen element from this ensemble is close to the design 
rate which is equal to r f=a 0.4872. Now decrease e smoothly. 
The conditions of Theorem ^] stay fulfilled until we get 
to £g »i 0.5313 (point B). At this point a second global 
maximum of the function ^-.(u) occurs. As the pictures in 
the bottom row of Fig. ^2 show, the hypotheses of Theorem 
1 101 are again fulfilled over the whole segment from E (the first 
threshold of the BP decoder corresponding to e# sa 0.5156) 
till G. In particular, at the point G, which corresponds to 
ec = e MAP ~ 0.4913, the trial entropy reaches zero, which 
shows that this is the MAP threshold. 

We see that for this example Theorem ^| allows us to 
construct the MAP EXIT curve for the segment from A to 
B and the segment from E to G. Over both these segments 
we have h UAP = h w . In summary, we can determine the MAP 
threshold and we see that the balance condition applies "at 
the jump G" (the MAP threshold). But the straightforward 
application of Theorem[K)]does not provide us with a means of 
determining /i MAP between the points B and D. Intuitively, /i MAP 
should go from B to C (which corresponds to e c w 0.5156). At 
this point one would hope that a local balance condition again 
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applies and that the MAP EXIT curve jumps to the "lower 
branch" to point D. It should then continue smoothly until the 
point G (the MAP threshold) at which it finally jumps to zero. 
As we will discuss in more detail in Example ^| after our 
analysis of the M decoder, this is indeed true, and h MAe is as 
shown in Fig. [3] 

Assuming Theorem 1101 applies, we know that at the MAP 
threshold the matrix corresponding to the residual graph be- 
comes a full rank square matrix. What happens at the jump at 
point C? At this point the matrix corresponding to the residual 
graph takes, after some suitable swapping of columns and 
rows, the form 

U V 
w 

where If is a full rank square matrix of dimension 
ec(A(yc) — A(yu)). The MAP decoder can therefore solve 
the part of the equation corresponding to the submatrix W. 

VI. Maxwell Construction 

The balance condition described in Section IT-SI and Section 
IIVI is strongly reminiscent of the well-known "Maxwell con- 
struction" in the theory of phase transitions. This is described 
briefly in Fig. [D] 

A. Maxwell Decoder 

Inspired by the statistical mechanics analogy, we will ex- 
plain the balance condition (shown on the right in Fig. 
which determines the MAP threshold by analyzing a "BP 
decoder with guessing". The state of the algorithm can be 
associated to a point moving along the EBP EXIT curve. The 
evolution starts at the point of full entropy and ends at zero 
entropy. The analysis of this algorithm is also most conve- 
niently phrased in terms of the EBP EXIT curve and implies 
a proof of Theorem [9] Because of this balance condition we 
term this decoding algorithm the Maxwell (M) decoder. Note 
that a similar algorithm is discussed in [13] although it is 
motivated by some more practical concerns. 

Analogously to the usual BP decoder for the erasure chan- 
nel, the M decoder admits two equivalent descriptions: either 
as a sequential (i.e., bit-by-bit in the spirit of [16]) or as 
a message-passing algorithm. While the former approach is 
more intuitive, the latter allows for a simpler analysis. We 
shall first describe the M decoder as a sequential procedure 
and sketch the main features of its behavior. In the next section 
we will turn to a message-passing setting and complete its 
analysis. 

Given the received word which was transmitted over 
BEC(e), the decoder proceeds iteratively as does the standard 
BP decoder. At each time step a parity-check equation in- 
volving a single undetermined variable is chosen and used to 
determine the value of the variable. This value is substituted 
in any parity-check equation involving the same variable. If at 
any time the iterative decoding process gets stuck in a non- 
empty stopping set, a position i £ [n] is chosen uniformly at 
random. The decoder is said to guess a bit. If the bit associated 
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Fig. 13. Maxwell construction in thermodynamics, (a) Pressure-volume 
diagram for the liquid-vapor phase transition (b) Van der Waals curve (using 
reduced variables, given by (p + y^)(3V — 1) = 8T at the reduced 
temperature T = 0.85 ) and the Maxwell construction. Consider the case 
of a liquid-gas phase transition of water. If a small amount of liquid is 
placed in a completely empty (and hermetically closed) large container at 
room temperature, the water evaporates. The vapor exerts pressure on the 
walls of the container. By gradually reducing the volume of the container, 
we increase the vapor pressure P until it reaches a critical value P c (which 
depends on the temperature). At this point the vapor condensates into liquid 
water. The pressure stays constant throughout this transformation. When there 
is no space left for the vapor, the pressure starts to rise again, and as shown 
in (a) it does so very quickly (since it is difficult to compress water). In 
many theoretical descriptions of this phenomenon, a non-monotonic pressure- 
volume curve is obtained like in (b) with the Van Der Waals model. The 
Maxwell construction allows to modify the "unphysical" part of this curve and 
to obtain a consistent result. We want to join the two decreasing branches of 
the theoretical curve with a constant-pressure line, as observed in experiments. 
At which height should we placed the horizontal line? The basic idea of the 
Maxwell construction is that, at the critical pressure P c , the vapor and the 
liquid are in "equilibrium". This means that we can transform an infinitesimal 
quantity of vapor into liquid (or vice versa) without doing any "work" on the 
system. Because of this reason, the vapor begins its transformation into liquid 
at P c . The work done on the system in an infinitesimal transformation is 
PdV, where dV represents the variation of the volume. Using this fact, it 
can be shown that the above equilibrium condition implies the equality of 
the areas of the two regions between the horizontal line and the original non- 
monotonous pressure-volume curve. See, e.g., [37]. 



to this position is not known yet, the decoder replicates 7 any 
running copy of the decoding process, and it proceeds by 
running one copy of each process under the assumption that 
xi = and the other one under the assumption that x,i = 1. 

It can happen that during the decoding process a variable 
receives non-erased messages from several check nodes. In 
such a case, these messages can be distinct and, therefore, 
inconsistent. Such an event is termed a contradiction. Any 
running copy of the decoding process which encounters a 
contradiction terminates. The decoding process finishes once 
all bits have been determined. At this point, each surviving 
copy outputs the determined word. Each such word is by 
construction a codeword which is compatible with the received 
information. Vice versa, for each codeword which is compati- 
ble with the received information, there will be a surviving 
copy. In other words, the M decoder performs a complete 
list decoding of the received message. Fig. [TJ] shows the 
workings of the M decoder by means of a specific example. 

7 Here we describe the decoder as a 'breadth-first' search procedure: at each 
bifurcation we explore in parallel all the available options. One can easily 
construct an equivalent 'depth-first' search: first take a complete sequence of 
choices and, if no codeword is found, backtrack. 



15 



9 10 11 12 13 




9 21 23 24 26 28 29 30 



(i) Unknown bits after transmission 
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(iv) Decoding bit 1 1 from check 
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(ii) Decoding bit 1 from check 1 
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(vii) Decoding bit 28 (X2 + xe) from check 6 (viii) Decoding bit 19 (xe) from check 14 
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(x) Decoding bit 30 (x% + £12 = £12) 



(xi) Decoding bit 24 (£12) from check 3 
from checks 11 and 15 > £g = 
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(iii) Decoding bit 10 from check 5 
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(vi) Guessing bit 6 
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(ix) Guessing bit 12 
^2+^12 
x 2 +x 12 - — v — ' 




(xii) Decoding bit 23 (£2 + £12) from 
check 4 




(xiii) Decoding bit 21 (X2 + X12) from check 7 




(xiv) Decoding bit 29 (£12 = X2 H 
from checks 2 and 9 > X2 = 



(xv) Decoding bit 26 > £12 

Final step! 



Fig. 14. M decoder applied to a simple example: a (3, 6) LDPC code of length n = 30. Assume that the all-zero codeword has been transmitted. At 
the decoder, the received (i.e., known and equal to 0) bits are removed from the bipartite graph. The remaining graph is shown in (i). The first phase is the 
standard BP algorithm: in the first three steps, the decoder proceeds as the standard BP decoder and determines the bits 1, 10 and 11, until it gets stuck in a 
stopping set shown in (iv). The second phase is distinct to the M decoder: it is the guessing/contradiction phase. The decoder guesses the (randomly chosen) 
bit 2: this means that it creates two simultaneously running copies, one which proceeds under the assumption that bit 2 takes the value 0, the other which 
assumes that this bit takes the value 1. The decoder then proceeds as the standard BP algorithm. Any time it gets stuck, it guesses a new bit and duplicates 
the number of simultaneously running copies. This process continues until a contradiction occurs, e.g., at the 9 lh step [ix): the variable node £30 (either 
x 30 = or £30 = 1 depending of which copy we are considering) is connected to two check nodes of degree one. The incoming messages from those nodes 
are x% + £12 and £12, respectively. Consistency now requires that x% + £12 = £12, i.e., that xq = 0, such that only the decoding copies corresponding to 
£6=0 survive. Phases of guessing and phases of standard BP decoding might alternate. Decoding is successful (in the sense that a MAP decoder would 
have succeeded) if only a single copy survives at the very end of the decoding process. "Contradictions" can be seen as "confirmations" or "conditions" in 
this message-passing setting. 



The corresponding instance of the decoding process is depicted 
in Fig. [21 from the perspective of the various simultaneous 
copies. 

Let us briefly describe how the analysis of the above 
algorithm is related to the balance condition and the proof 
of Theorem [9] Instead of explaining the balance between the 
areas as shown in Fig. ^ we consider the balance of the 
two areas shown in Fig. [2] Note that these two areas differ 
from the previous ones only by a common term, so that 
the condition for balance stays unchanged. From the above 
description it follows that at any given time t there are 2 H ^ 
copies running, where H (t) is a natural number which evolves 
with time. In fact, each time a bit is guessed, the number of 



copies is doubled, while it is halved each time a contradiction 
occurs. Call t out the time at which all transmitted bits have 
been determined and the list of decoded words is output 
(tout does not depend upon the particular copy of the process 
in consideration). Since the M decoder is a complete list 
decoder and since all output codewords have equal posterior 
probability, H(X\Y) = H(t oM ). On the other hand, H{t out ) is 
equal to the total number of guesses minus the total number 
of contradictions which occurred during the evolution of the 
algorithm. As we will see in greater detail in the next section, 
the total number of guesses divided by n converges to the area 
of the dark gray region in Fig. [2] (a), while the total number of 
contradictions divided by n is asymptotically not larger than 
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the dark gray area in Fig. |3 (b). Therefore, as long as e is 
strictly larger than the value at which we have balance, call 
this value e MAP , lim^^ W^CMH > o. This implies that 

^MAP ^> ^MAP 

We expect that the number of contradictions divided by 
n is indeed asymptotically equal to the dark gray area in 
Fig. |3 (b). Although we are not able to prove this statement 
in full generality, it follows from Theorem [K)] whenever the 
hypotheses hold. 
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Fig. 15. M decoder applied to the simple example shown in Fig. 1141 The all- 
zero codeword is decoded. The initial phase coincides with standard message- 
passing BP algorithm: a single copy of the process decodes a bit at a time. 
After three steps, the BP decoder gets stuck in a stopping set and several steps 
of guessing follow. During this phase the associated entropy H(t) increases. 
After this guessing phase, the standard message passing phase resumes. More 
and more copies terminate due to inconsistent messages arriving at variable 
nodes. At the end only one copy survives. This shows that this example has 
a unique MAP solution. 



B. Message-Passing Setting 

We describe now a message-passing algorithm that is equiv- 
alent to the above sequential formulation. First note that 
because of the code linearity, the symmetries of the channel 
and the decoding algorithm, we can simplify our analysis by 
making the all-zero codeword assumption, see [17]. 

We assign a label /if to the variable node of index i. 
The label can take three possible values h\ G {0,*,g}. It 
can be viewed as the output of some fictitious channel, and 
indicates how the algorithm is going to treat that variable node. 
The fictitious channel is memoryless: each variable node is 
assigned a with probability 1— e, a * with probability e(l— 7) 
and a g with probability e-f. The parameter 7 represents the 
fraction of guesses ventured so far. 

The new message-passing algorithm employs left-to-right 
messages /.i x and right-to-left messages fj, y , all of which take 
values in {0, *,g}. The meaning of the message and the * 
message is the same as for the BP algorithm. A g message 
indicates that either the bit from which this message emanates 
has been guessed or that the value of this bit can be expressed 
as a linear combination of other bit values which have been 
guessed. Operationally, we can think of the message Hi = g 
as being a shorthand for a non-empty list of indices O; = 
{jii ■ ■ ■ ijk}- This list indicates that Xi is expressible as xi = 
Xj 1 + ■ ■ ■ + Xj k , where {xj 1 , Xj k } is a set of guessed bits. 

This motivates the following update rules for the parity- 
check and variable nodes shown in Fig. ^| 

(i) Update rule for a parity-check node of degree r: Assume 
that the index set for the (r — 1) messages which enter the 




fir- 



(i) (ii) 
Fig. 16. Update rule for parity-check nodes (i) and variable nodes (ii). 



check node is TZ = [r — 1]. Then 

'0, if Vi G TZ, Hi = 0, 
*, if Eli G TZ, Hi = *, 

g, if Vj G TZ, fij ^ *, and 3i G TZ, Hi = g- 

With respect to the BP decoder, the only new rule is the one 
which leads to h 7 = g. It is motivated as follows. Assume 
that for all i G TZ we have Hi — 0\g and that at least one 
such message is g. This means that the connected variables 
Xi, i G TZ, are either known, have been guessed themselves, 
or can be expressed as a linear combination of guessed bits 
(and at least one such value is indeed either a guess itself 
or expressible as a linear combination of guesses). Since the 
variable connected to the outgoing edge is the sum of the 
variables connected to the incoming edges, it follows that this 
variable is also expressible as a linear combination of guesses. 
Therefore, h 7 = g m this case. Operationally, we have r — 
1 lists Oi,...,O r _i (at least one of which is non-empty) 
entering the check node. The outgoing list O y is obtained as 
the union of the incoming lists, where indices which occur 
an even number of times in the incoming lists are eliminated. 
The list y provides a resolution rule for x\ + ■ ■ ■ + x r _i, and 
therefore for the variable connected to the outgoing edge. 

In the above description and the definition of the message- 
passing rules we have ignored the possibility that the union 
of the incoming lists (at least one of which is non-empty) 
is empty. This can happen if a complete cancellation occurs 
(every index appears an even number of times in the incoming 
lists). Fortunately, as we shall see, this assumption has no 
influence on the proof of Theorem |9] 

(ii) Update rule for a variable node of degree 1: Assume that 
the index set for the 1 — 1 messages which enter the variable 
node is C = [l - 1] U {e}. Then 

0, if 3i G C, Hi = 0; 
*, if Vi £ C, Hi — *, 
g, if Vi e C, Hi an d 3j g C, Hj = g- 

Once again, it should be enough to motivate the rule which 
leads to /i x = g. Recall that g indicates that the bit is not 
known but that it has either been guessed or that the bit is 
expressible as a linear combination of guessed bits. Therefore, 
if none of the incoming messages is a 0, and at least one is a 
g, then the outgoing message is a g. Operationally, this means 
that the outgoing list is equal to one of the incoming non- 
empty lists. E.g., if the bit itself has been guessed (i.e., /if = 
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g) and all other incoming messages are * then the outgoing 
message is {i}. 

From the messages we can obtain estimates vi, £ [n], of the 
transmitted bits (the j/;'s are node- rather than edge-quantities). 
In order to obtain these estimates we apply the same rule as 
for the variable node update, see (ii) above, with incoming 
messages corresponding to all of the neighboring check nodes. 
In other words, for a degree 1 variable node, we have C = 
[1] U {e} instead of C = [l - 1] U {e}. 

The consistency of the estimates implies a set of linear 
conditions on the guessed variables. Consider all the messages 
Hi entering a fixed variable node and the associated (possibly 
empty) lists 6, = ■ ■ ■ ,f k }. Let £ M , ^ £ {0, g, *} denote 
the subsets of indices i with = /i. 

1) If Cq ^ and C g ^ 0, then, for any i £ £ g , we have 
the condition 

X;, H hi,. = , mod 2 . (17) 

Jl Jk 

The total number of resulting conditions is |£ g |. 

2) If Co = and |£ g | > 2, then fix i € C g . For any 
I £ C g \{i}, we have the condition 

X;i + ■ ■ • + x.ii = Xai + ■ • • + xa , mod 2 . (18) 

The total number of resulting conditions is \C Z \ — 1. 

The algorithm stores in memory each new condition produced 
during its execution. Notice that each conditions involves 
uniquely bits Xi for which p\ = g. It can happen that a 
particular condition is either linearly dependent upon previous 
ones or empty. The last case occurs if the corresponding lists 
are empty, which in turn may be the consequence of a previous 
parity-check node update (see the description of the check- 
node update rule above). Given a set of guesses, any subset 
of them whose values can be chosen freely without violating 
any of the conditions produced by the M decoder, is said to be 
independent. Of course, the maximal number of independent 
guesses is equal to the number of guesses minus the number 
of linearly independent conditions. 

Conditions are equivalent, in the present setting to what have 
been called contradictions in the description of of Sec. IVI-AI 
In fact, if one thinks of guessed bits as i.i.d. uniformly random 
in {0, 1} then each new, independent condition, cf. Eqs. d!7i . 
Jl 81 is satisfied with probability 1/2. 

It is useful to estabilish the following convention for de- 
noting the successive message passing iterations. At the t th 
iteration (with t = 0, 1, . . .) we first update all the left-to- 
right messages and then all the right-to-left messages. We 
have therefore • ■ ■ -> fj7(t - 1) -> p x (t) -> pj(t) -> 
p x (t + 1) — > .... Notice that, as the number of iterations 
increases, a given message can change its status according to 
one of the transitions * ^ g, g ^ or * ^ 0. Therefore 
the algorithms surely stops after a finite number of iterations 
(at most twice the number of edges in the graph). We shall 
denote the fixed point as p x (oo), /i y (oo). At the t th iteration 
the algorithm deliver an estimate i>i(t), i £ [n] of the z th 
transmitted bit. 



C. The Case of Tree Graphs and Some Simple Consequences 
As for other message-passing algorithms, it is instructive to 
study the behavior of the M decoder on trees. In particular, we 
will show that: (a) On a tree the sequential M decoder guesses 
exactly as many variables as there are degrees of freedom in 
the system (implying that all these guesses are independent); 
(b) on a tree the number of independent guesses ventured 
by the (not necessarily sequential) M decoder by end of the 
decoding process is equal to the number of degrees of freedom 
of the system and it can be computed in a local way; (c) 
the same local counting formula gives in general (for Tanner 
graphs that are not necessarily trees) an upper bound on the 
number of independent guesses which remain at the end of 
the decoding process. 

We have already explained that, for the purpose of analysis, 
we can make the all-zero codeword assumption. Therefore, in 
the sequel we only have to consider linear systems of equations 
with a zero right side. We say that the M decoder is bit-by- 
bit (or sequential) if any time the BP phase comes to a halt, 
the decoder guesses a single unknown bit and then proceeds 
by processing all consequences until no further progress is 
achieved. 

Lemma 9 (Number Of Guesses of Sequential M Decoder): 
Consider a binary linear system of equations with right side 
equal to zero and k degrees of freedom (i.e., k is equal to the 
number of variables minus the rank of the system). Assume 
that the Tanner graph associated to this system is a tree. Then 
the sequential M decoder ventures exactly k guesses during 
the decoding process and all these guesses are independent. 

Proof: Without loss of generality we can assume that 
there are no check leaf nodes. In fact, whenever degree-one 
check nodes are present, the standard BP decoder can be run 
until all such nodes have been removed. For each variable 
node which is removed in this fashion, the rank of the system 
is decreased by exactly one as well. 

We claim that the resulting system of equations has full 
rank. To see this, assume to the contrary that there is a non- 
zero linear combinations of equations that yields zero. Look at 
the Tanner graph corresponding to this subset of equations: all 
variable nodes have (even) degree at least two and all check 
nodes have degree at least two (as argued above). It is well 
known that a graph with minimum degree at least two contains 
at least one cycle, contradicting the hypothesis that the initial 
graph was a tree. 

Consider therefore a Tanner graph which is a tree and all of 
its leaf nodes are variables. Let lj, i £ [n], (r^, i £ [m]) denote 
the degree of variable (check) node i. By our remarks above, 
the corresponding system of equations has n — m degrees of 
freedom. Therefore, it is clear that the M decoder has to guess 
at least n — m bits before it stops. We claim that it ventures 
exactly n—m guesses, i.e., that on a tree the sequential guesses 
are independent. 

At the start of the decoding process all messages are 
erasures. We will show that at the end of the decoding process 
each edge carries exactly one g message in one direction and 
a * message in the other direction. This proves our claim: 
it implies that a variable node which has been guessed, and 
hence all of its outgoing messages carry a g message, has 
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no incoming g message. It is therefore not constrained by 
any of the other guesses, i.e., it is independent. Clearly, at 
the end of the decoding process each edge has to carry a g 
message in at least one direction; otherwise the connected bit 
has not been determined yet, contradiction the assumption that 
the M decoder has halted. 

e e 





V 



g 



(a) (b) 
Fig. 17. In (a) consider the messages flowing along edge e. Assume that the 
outgoing message (shown in a frame) switches as a consequence of a newly 
guessed bit from * to g. Assume further that the incoming message flowing 
in the opposite direction is g as well. This provides the induction step from 
odd levels to even levels. As indicated in the figure, it then follows that both 
messages along edge e are g as well. The case of an edge exiting a variable 
node is shown in (b) and follows by essentially the same argument. 

Let us show that it can not carry a g message in both direc- 
tions. Initially all messages are *. The sequential M decoder 
proceeds in phases, guessing a bit and then determining all 
consequences of this guess during the BP phase until it gets 
stuck again. Let us call one such guess followed by the BP 
phase one iteration. Let us agree that during the BP phase 
the consequence of a newly guessed bit are computed in 
order of increasing distance from the guessed bit. This means, 
that we first process all edges directly connected to this bit 
(call this level zero), then all edges at distance one (call this 
level one) and so on. Assume that when we process level t, 
t > 1, we encounter an edge whose outgoing (away from the 
newly guessed bit) message switches from * to g and whose 
incoming message already is g. We claim that then the same 
must have occurred at level t — 1. This is quickly verified by 
checking explicitly both cases: an edge which goes from a 
check node to a variable node (odd levels i; left picture in 
Fig. 1171 and the case of an edge which goes from a variable 
node to a check node (even levels t; left picture in Fig. I17i. If 
we apply this argument inductively, we see that the guessed 
variable node must have had an incoming message which was 
g, contradicting the fact that the M decoder decided to guess 
this bit. ■ 

What happens if we run the M decoder in a non-sequential 
way, i.e., if we guess many/several bits each time we get 
stuck? In this case it can happen that some of the guesses are 
dependent. Nevertheless, the number of independent guesses 
remaining at the end of the process is still equal to the degrees 
of freedom of the system of equations. More importantly, on 
a tree this number of independent guesses can be computed 
in a local way. 

Lemma 10 (Number of Independent Guesses): Consider a 



binary linear system of equations with right side equal to 
zero and k degrees of freedom (i.e., k is equal to the number 
of variables minus the rank of the system). Assume that the 
Tanner graph associated to this system is a tree and that it 
contains no check nodes of degree one. Then the number of 
independent guesses ventured by the M decoder at the end of 
the decoding process is equal to k. Further, let G denote the 
total number of guesses of the M decoder, denote by if the 
number of incoming g messages at variable node i (including, 
if applicable, the guess of the bit itself), and by C g the subset 
of all check nodes all of its incoming messages are g. Then 
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Proof: By definition of the algorithm, at the end of the 
decoding process all bits have been determined (i.e., guessed 
or expressed in terms of guessed bits). This means that among 
the guesses ventured by the M decoder there must be k 
independent such guesses. Now note that the final state of 
the messages is independent of the order in which the guesses 
are taken. It is convenient to imagine that we first venture the 
k independent guesses and then apply the BP decoder. At the 
end of this phase all bits are known. Further, from Lemma |9] 
we know that if = 1 for all i e [n] and C g is the empty set. 
Therefore, the stated counting formula is correct at this stage. 
Assume now we proceed in iterations, adding one guess at a 
time and propagating all its consequences. We will verify that 
the counting formula stays valid. Assume therefore that the 
counting formula is correct at the start of an iteration and add 
a further guess, lets say of variable i. This extra guess increases 
if by one and increases the number of guesses by one, keeping 
the counting formula intact. Consider now the ensuing BP 
phase. Consider an edge e emanating from a variable node 
i, the check node connected to it, call it j and all the edges 
and variable nodes connected to this check node. Assume that 
the message from i to j is * (in the case that this message is 
already g, the message does not change and there is nothing 
to prove). As a consequence the message from j to i must 
be a g because of the argument above. Also, all the incoming 
messages into j but the one form i must be g as well (otherwise 
the update rule would have been violated at node j). Update 
all the corresponding edge messages. If the message from i 
to j does not change, then neither does any of the messages 
outgoing at the check node and the counting formula stays 
valid. If, on the other hand, the outgoing message along edge 
e flips to g then so do all the messages outgoing from the 
check node j. Assume that the check node has degree tj. 
Then, C g now contains j. This increases the right hand side 
of the counting formula by Tj — 1. On the other hand it also 
increases if by one for all Z E V which are connected to check 
node j, but for node i (the corresponding message was already 
a g). In total this decreases the right hand side of the counting 
formula by — 1. ■ 
Each part of the counting equation dl9l has a pleasing 
interpretation. As stated, G is the total number of ventured 
guesses. If a variable node has I s incoming g messages 
then these correspond to l g linear equations, each of which 
determines the same bit. This gives rise to (l g — 1) linear 
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conditions which the G guesses have to fulfill. But not all 
these conditions are linearly independent. Consider Fig. [H| If 
a check node of degree r has all of its incoming messages 
equal to g then the r equations which correspond to the r 
outgoing messages are identical, i.e., r — 1 of them are linearly 
dependent. The last term in the counting formula dl9l therefore 
corrects the over-counting of dependent conditions. 




time t time t + 1 

Fig. 18. Computation of the number of linearly independent conditions. 
To each of the incoming edges corresponds a list. To keep things simple and 
without essential loss of generality, assume that &i = {i}. The three outgoing 
lists are then 0i = {2,3}, 6 2 = {1,3}, and 3 = {1,2}. Compare the 
incoming and outgoing list at node 1: we get the condition x\ = X2 + £3. 
But exactly the same condition appears at node 2 and node 3. In general, a 
check node of degree r, all of its incoming messages are g, generates r — 1 
linearly dependent conditions. 

Example 8: Consider a code whose Tanner graph is a tree 
and all leaves are variable nodes. Let the set of variables 
(checks) be indexed by [n] ([m], and let lj, i G [n], (r,, 
i G [m]) be the degree of variable (check) node i. Assume 
that the M decoder guesses all leaf (variable) nodes and then 
proceeds by message passing. It is not very hard to see that 
in this setting the decoder proceeds with the message-passing 
phase (starting from the leaf nodes) until all variables have 
been determined and that no further guesses have to be made. 
Further, at the end of the decoding process all messages are 
g- 

Let us determine the number of independent guesses at the 
end of the decoding process using the counting formula ( I20> . 
Note that for each leaf node we have I s = 2 (one guess and 
one additional incoming g message. For all internal variable 
nodes we have l g = 1. Finally, C g = C. If we let ni denote the 
number leaf nodes, so that G = nu we get that the number of 
independent guesses is equal to 

ni - y, (2 - 1) - E & - *) + E ( r * - x ) 

igleaves -tg [n]\leaves i£[m] 

= - (M - !) + E ( r i - 1) = » - m. 

iG [n] i€ [rn] 

This is of course the expected result since the system has 
exactly n — m degrees of freedom. 

So far we have only considered sets of equations whose 
Tanner graph is a tree. What happens if we run the M decoder 
on a general system of equations. For a general Tanner graph, 
the above counting of the total number of independent guesses 
is not necessarily tight. The counting of the total number of 
conditions generated by the M decoder is always correct. But 
it can happen that besides the obvious over-counting at check 
nodes, there are other dependencies generated by loops in 
the graph which are not considered in the counting formula. 



Therefore, in general we only get a lower bound. Let us state 
this explicitly. 

Lemma 11 (Lower Bound on Independent Guesses): 
Consider a binary linear system of equations with right side 
equal to zero and k degrees of freedom (i.e., k is equal to the 
number of variables minus the rank of the system). Assume 
that the Tanner graph associated to this system contains no 
check nodes of degree one. Let G denote the number of 
all guesses of the M decoder, denote by if the number of 
incoming g messages at variable node i (including the guess 
if this node has been guessed), and by C g the subset of all 
check nodes all of whose incoming messages are g. Then 

fc^G-^CLf-lJ + Efc- 1 )- (20) 

iev iec s 

D. Density Evolution Analysis 

Let us now perform the usual DE analysis. Let x^ x denote 
the probability that a left-to-right message at time t is equal 
to p x G {0,*,g}, and let y^ y denote the corresponding 
probability for a right-to-left message. 

(i) At the check node side the DE relations read 

Yo = p( x o), 

yi = l-p(x*+x*) = l-p(l-x*), 
y* = l-y -yt = p(x*+x*)-p(x*). 

(ii) At the variable node side the DE relations are 

x* +1 = l-eA(y^ + y *J, 

x t + 1 = (l- 7 ) e A( y :), 

x^ 1 = e A( y *+ y :)-(l- 7 ) e A( y :). 

According to our convention, the iteration counter is increased 
only in the variable node operation. Moreover, the variables 
x * (yt) an d x l + x g (y* +yp satisfy the same equations as the 
fractions of erased messages in the standard BP decoder with 
erasure probabilities e(l — 7) and 7, respectively. This is an 
immediate consequence of the update rules defined in section 



When the time t tends to 00, DE converges to the fixed- 
point probability distribution. To settle our notation, we write 
( x o, x *, x g ) ( x S°( e >7), x f(e I 7), x g°(e,7)) and equiva- 

lently (y ,yCy|) — ► (yg°(e, 7), yj° (e, 7), y£°(e, 7)) ■ Ob- 
serve that xj°(e,7) satisfies the equation x = e(l — 7)A(1 — 
p(l — x)), while xg°(e,7) = xg°(e) satisfies the equation 
(l-x)=eA(l-p(l-(l-x))). 

Notice that the asymptotic state of the algorithm has the 
following structure. The variable nodes such that fi(oo) = * 
or z^(oo) = g, form a stopping set: in fact this is the largest 
stopping set contained in the set of variable nodes for which 
/if = * or [i\ = g. Further, the set of variable nodes such that 
fi(oo) = * form a stopping set contained in the previous one: 
this is the largest stopping set contained in the set p\ = *. 

In the analysis below we shall repeatedly use the fol- 
lowing trick. We shall compute expectations with respect to 
asymptotic (t = 00) incoming messages in a given node. 
In such computations, we shall treat such messages as i.i.d. 
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with distribution (x£°, x^°, x^°), (for left-to-right messages) 
or (yo°i yj°7 Yg )' (f° r right-to-left messages). As long as 
(£,7) take non-exceptional values, i.e., at continuity points of 
(xg°(e, 7), xj°(e, 7), x^°(e, 7)), cf. Section|V] this is justified 
as follows. First consider messages after a finite number of 
iterations t. For n large enough these are independent because 
the Tanner graph is locally a tree. But, if (6,7) is non- 
exceptional the number of message which change between the 
< th iteration and the asymptotic state is bounded by n8(t) with 
5(i) — + as t — ► 00. This argument is essentially the same as 
the one of App. III-AI 

E. Guessing Strategy 

In the analysis of the M decoder, we can chose the order 
of guesses at our convenience. As long as the message is 
completely decoded and the final estimates are 1^(00) £ {0, g} 
for any bit %, the algorithm realizes a complete list decoding. 

We shall adopt the following strategy: we perform n roun d s 
"decoding rounds". Our progress will be measured by the 
parameter 7, which is initially set to zero and which advances 
by A7 = l/n roun d s in each round. 

Set 7 = 0. Start with the messages received via BEC(e) 
and apply BP decoding until the algorithm gets stuck. Then 
consider each of the bits not yet determined and set \i\ = g 
independently for each of them with probability — 7). 

(In the first round this probability is equal to A7.) Set 7 = 
7 + A7. 8 Apply the M decoder until it gets stuck. This is 
repeated n 10U nds times until 7 = 1. If at any earlier phase 
complete decoding is achieved, the algorithm is halted and 
the current set of decoded codewords output. 

The analysis becomes simpler (and the algorithm more 
efficient) if we take A7 — > 0. We shall always think of 
this limit being taken after n — > 00. We will see that in 
this limit the appearance of contradictions is sharply concen- 
trated to those rounds which include a discontinuity of the 
EXIT curve. In other words, we will see that the algorithm 
alternates between the following two phases which are well 
separated: in the "guessing phase" the algorithm guesses a 
small fraction of bits and the processes the consequences but 
theses consequences do not propagate too far and essentially 
stay local; in the "contradiction phase" on the other hand the 
algorithm suddenly discovers many relationships (finds many 
contradictions) and the size of the residual graph changes by 
a constant fraction which is independent of the step size A7. 

F. Analysis: Guess Work 

Consider a non-exceptional point (e, 7) and let nAG be the 
number of newly guessed variables when 7 is changed by an 
amount A7 > 0. 

The process can be described as follows. For each is [n], 
i is selected independently with probability A7/(l — 7). For 
each selected bit, we consider the present estimate provided 
by the M decoder: z^(oo) G {0,g, *}. If 1^(00) = *, the 

8 Note that if a bit is first selected with probability 7 and then independently 
selected with probability A^y/(1 — 7), then the probability that it was selected 
at least once is equal to 7 + A7. This is the rational for our choice of 
parameters. 



observation on i is changed from \i\ = * to /if = g: the 
counter of newly guessed variables is increased by one. By 
linearity of expectation, we get 

E[AG] = - V Pr(i is selected) Pr(i/j(cc) = *) 
ig[n] 

= 7^ 2 - £ ( 1 -7)A(yr)-eA(yr)A7. 

1-7 

Notice that, in this computation we assumed n — ► 00 and 
t — > 00 afterwards. 

Recall that, after 7 is changed to 7 + A7 and the nAG 
new guesses are introduced, the message passing M decoder 
is started again until a new fixed point is reached. 

G. Analysis: Confirmation Work 

At each step of the above algorithm, it may happen that 
several g messages are transmitted to the same variable node 
Xi. Each of these lists corresponds to a distinct resolution rule 
for Xi. Their convergence on the same node imposes some 
non-trivial condition on the variables which appear in the 
resolution rules. Here we estimate the number of independent 
such conditions by exploiting Lemma ^2 above. Notice that 
in Lemma ^2 we assume /if £ {g, *}. In order to make 
contact with this assumption we could first run the classical 
BP decoder until no further progress can be made. We could 
now directly apply Lemma ^2 to the residual graph. The 
disadvantage of this strategy is that in this scheme it is not 
so straightforward to relate the progress of the M decoder on 
the residual graph to the original DE equations. 

Alternatively we can apply Lemma directly to the 
original graph if (i) we do not count contradictions generated 
at variable nodes which receive at least one message (either 
from the channel or from the graph) and (ii) we count towards 
the degree of a check node only those edges whose incoming 
messages are not 0. With these two conventions one can check 
that Lemma^Jholds for a general graph including degree-one 
check nodes as well as variable nodes which are known. 

Let (e, 7) be a non-exceptional point and denote by nC the 
number of contradictions as estimated by the right-hand side 
of J20l >. The first term counts the number of conditions arising 
at that node. We get 




e ( 1 - 7) AlEl { max («g - !> °) I «o=o} 
1 

+ £7^] A1E1 {max(n, g ,0)I„ o=0 } , 
1 

where Ia is the indicator function for the event A and 
where n g , n , and n* count the number of incoming g, 0, 
and ? messages. Here the limits n — ► 00 and t — > 00 
are understood and Ei denotes expectation with respect the 
multinomial variables with sum 1 and parameters 

ygo y.00 yOO Note jjjjfl we have m e indicator f UI1C t ioil ^(,=0 

since by our remarks above we should only consider nodes 
"in the residual graph", i.e., nodes which were not already 
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determined in the BP phase as a consequence of the received 
bits. Throughout this section we shall adopt the shorthands 
yoj y, y* for y£°, y^°, y£° (and analogous ones for left-to-right 
messages). By computing these expectations we get 



El - Vmax(|A 

n 

V iev 



1,0) 



e(l - 7) {A'(y* + y g )y g - A(y* + y g ) + A(y*)} 

+ e 7 A'(y, + y g )y g . (21) 

We must now evaluate the correction term in J20i . Consider 
a check node a. Assume that its "residual" degree is r' a . I.e., 
r' a counts the number of edges whose incoming messages are 
not zero. If the corresponding r' a outgoing messages are all 
g (equivalently, the r' a ingoing messages are all g), then the 
same condition has been overcounted r' a — 1 times. We denote 
the set of such check nodes as C and obtain 



E 



aGC ) 



^^r r E r {max(n g -l,0)I„ t=0 } , 

where E r denotes expectation with respect the multinomial 
variables n ,n g ,n» with sum r and parameters xg° , x°° , xj° . 
Once again, it is quite easy to compute the above expectations. 
One obtains 



E 



{sg«-i>} = $>'<i-*,)* s 



-r(i-x») + r(i-x„ - Xg )}. (22) 

By taking the difference of Eqs. M\\ and (1221 . and after a 
few algebraic manipulations, we finally get the desired result 

E[C]=F(x,e, 7 ), 

where 

F(x, 6,7) = A'(l)[x,(l - y.) - (x, + x g )(l - y» - y g )]- 
- e (l- 7 )[A(y*+y g )-A(y*)]+ 
, A'(l) 



F(l) 



[r(i-x*)-r(i-x*- Xg )] 



Here we used the shorthand x for the vector 
(x*,x g ,x ,y*,y g ,yo). 

Imagine now changing 7 — > 7+ A7 and computing the num- 
ber of new conditions on the newly guessed variables (whose 
expected number was computed in the previous section). Call 
AC the upper bound on their number provided by Lemma ITT1 
It is clear that, repeating the above derivation, we get 

E[AC] = F(x°°(e, 7 + A 7 ), e, 7 + A 7 )- 

-F(x 00 ( e ,7),e,7 + A 7 ), 

Consider now two separate possibilities. In the first case 
x°°(e, 7') is continuous (and therefore analytic) in the interval 
7' € [7,7 + A7]. By Taylor expansion we get 

r)F B-x°°(f V 

E[AC] = ~{^, + A 7 ) • g A 7 + 0((A 7 ) 2 ) . 



with the gradient of F being evaluated at x' = x°°(e, 7 + A7). 
A direct calculation shows that the gradient vanishes at this 
point leading to E[AC] = 0((A 7 ) 2 ). 

In the second case, the interval [7, 7+A7] includes a discon- 
tinuity point (a jump) 7j. Let Xj + = x J+1 = lim 7 | 7j x°°(e, 7) 
and xj_ = x J = lim 7 j- 7 , x°°(e, 7). We have 

E[AC] = F(x j+ , 6, 7j ) - F( Xj _, c, 7j ) + 0(A 7 ) . 



H. Finishing the proof 

Consider now the guessing strategy explained in Section 
IVI-EI First the received message is decoded with the usual 
iterative decoder. At this point 7 = 0. Then each bit is selected 
independently with A7/(l — 7) and guessed if its valued 
was not determined (eventually in terms of former guesses) 
at previous stages. The M decoder is then run until a fixed 
point is reached. The number of new guesses at this stage is 
AG 7 and the number of new conditions is upper bounded by 
AC 7 . This operation is repeated until fj(oo) € {0, g} for each 
i. Without loss of generality, we may imagine this to happen 
at 7 = 1. 

At this point each realization of the guesses compatible with 
the conditions yields a codeword compatible with the received 
message. We have 

lim -E G \H G (X\Y)} > V E[AG 7 ] - V E[AC 7 ] 

n— >oo n z — ' * — ' 

7 7 

= / eA(y >t (7,e))d7-^AF j +0(A7), 
Jo 

where the last sums runs over the jump positions 7j and AFj < 
F(xj + , e, 7j) — F(xj_, e, 7j) is the discontinuity of F at those 
positions. In order to finish the proof of Lemma |9] notice 
that H(X\Y) does not depend upon A7 and we can therefore 
take the limit A7 — > discarding 0(A r y) terms. Moreover 
y*( 7 , e) = y(e(l — 7)) (the last quantity being the fixed point 
of DE for the usual BP decoder at erasure probability e), and 
therefore 



eA(y*( 7 ,e))d 7 = / A(y( e '))de' 
'o Jo 

is just the area under the BP EXIT curve (dark gray in Fig.^ 
(a)). Finally, let ej = (1 — 7j)e and (x(ej+), y(ej+)) and 
(x(ej— ), y(ej— )) be the fixed point of DE for the usual iterative 
decoder just above and below the jump. Then 

AFj = P £j (x(e j -),y(e j -) - P £j (x(e j+ ), y(e j+ )) , 

where P e (x, y) is the trial entropy, cf. Def. |4] Because 
of Lemma 0] AFj is just the area delimited by the EBP 
EXIT curve and a vertical line through the jump, (dark gray 
in Fig. □ (b)). 

/. Maxwell Decoder: Illustration and Implementation 

The Maxwell decoder provides an interpretation for the 
balance of areas which we described in Sections II VI and 
Ivl For many ensembles, e.g., the (3, 6)-regular ensemble, 
Theorem ^3 gives a complete characterization of the MAP 
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EXIT function and therefore a complete justification of the 
Maxwell construction. In some other cases we are not quite 
as lucky, see e.g. the ensemble discussed in ExampleQ and we 
can only conjecture that the parts of the MAP EXIT function 
which are not covered by Theorem^|also follow the Maxwell 
construction. Let us now review some typical case. 

Example 9 ((3,6) LDPC ensemble): Consider the 
dd pair (A,p) = (x 2 ,x 5 ) and the corresponding LDPC 
ensemble with design rate one-half. Its BP and MAP 
EXIT functions are depicted in Fig. ^ together with the 
balance conditions. Fig. [H] shows the evolution of the 
entropy Hit), i.e., the logarithm of the number of running 
copies as discussed in Fig. ^] as a function of the fraction 
of bits determined by the decoding process for the (3, 6)- 
regular LDPC ensemble. Transmission takes place over 
BEC(e = 0.46), i.e., we fix the channel parameter e so that 
e BP w 0.4294 < e < e MAP w 0.4882. After transmission, 
a fraction 1 — e = 0.54 of bits is known. The classical 
BP algorithm proceeds until it gets stuck at the fixed 
point (x e w 0.3789, y e « 0.9076) of DE. At this point 
(point A in the figure), a fraction 1 — eA(y c ) as 0.6561 of 
bits has been determined. Now the guessing phase of the 
M decoder starts. It ends at point B, which corresponds to 
the BP threshold (x BP w 0.2606, y BP « 0.7790). The total 
fraction of guesses that the M decoder has to venture is 
/ x bp /i(e(x))de(x) = P(x e ,y e ) - P(x BP ,y BP ). For our specific 
example we have F(x,y(x)) = -^i + 10x 3 -^|^+7x 5 -^, 
so that the total fraction of guesses is equal to 0.0201509. For 
a blocklength of n = 34000 this corresponds to roughly 685 
guesses. At this point the BP decoding phase resumes. More 
and more guesses are confirmed. Since we are operating 
below the MAP threshold, (essentially) all guesses are 
eventually confirmed and the M decoder comes to a halt. 




0.0 0.2 0.4 0.6 0.8 1.0 



Fig. 19. M decoder applied to the (3, 6)-regular LDPC ensemble. Asymptotic 
entropy of the M decoder H (logarithm of the number of running copies) as 
a function of the fraction of determined bits. 15 channel and code realizations 
with e = 0.46 and blocklength n = 34 • 10 3 are shown (dashed curves) 
together with the analytic asymptotic curve (solid curve). The inserts show 
how the entropy curve can be constructed from the EXIT curve. The fraction of 
guesses is shown in the 2 left-most inserts while the fraction of contradictions 
is shown in the 2 right inserts. 

Example 10 (Typical Double "Jump"): Consider the 
dd pair (A, p) = ( 3x +3x^+4x — an( j tng corresponding 
LDPC ensemble with design rate r = §§ « 0.4872. Its BP 
EXIT function is depicted in Fig. [5] its EBP EXIT curve 
together with the balance conditions is shown in Fig. [3] 



H 

ri-10- 3 
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Fig. 20. M decoder applied to the (3, 6) LDPC ensemble: Expected symptotic 
entropy as a function of the fraction of determined bits at e = 0.46 (solid 
curve) and empirical average entropy curves (gray curves). Simulations are 
shown for n = 780 (average over 6 • 10 4 realizations), n = 3125 (average 
over 16 ■ 10 3 realizations), n = 12500 (average over 4 ■ 10 3 realizations), 
n = 50000 (average over 10 3 realizations), n = 200000 (average over 150 
realizations). 

Finally, in Example we have discussed how large parts of 

the MAP EXIT curve can be constructed based on Theorem 

[I0l The MAP threshold is e MAP w 0.4913 (at x MAP w 0.1434). 

According to the Maxwell costruction, the second MAP 

discontinuities occurs at e MAP ' 2 w 0.5186 (at x MAP ' 2 w 0.2378, 
-map,2 _ o 412 i) , 

Fig. shows the evolution of the entropy H(t) for e = 
0.5313. This corresponds to the point C in Fig. [7J the first 
point at which the counting argument no longer applies. By 
comparing the result of the simulations to the analytic curve, 
corresponding to the Maxwell construction we can see that at 
least emperically the Maxwell construction seems to be valid 
over the whole range. 




0.4 0.6 0.8 1.0 0.4 0.6 0.8 1.0 
(a) (b) 

Fig. 21. M decoder applied to the irregular "double-jump" LDPC ensemble 
shown in Fig. [5] Asymptotic entropy as a function of the fraction of 
determined bits at e = 0.5313 (point B). (a) 15 channel and code realizations 
of blocklength n = 34000 are shown (dashed curves) together with the 
analytic asymptotic curve (solid curve), (b) Convergence of the average 
entropy curves (gray curves) to the analytic expected curve (solid curve). 
Simulations are shown for n = 780 (average over 6 • 10 4 realizations), 
n = 3120 (average over 16 • 10 3 realizations), n = 12480 (average over 
4 10 3 realizations), n = 50017 (average over 10 3 realizations), n = 200500 
(average over 250 realizations). 



VII. Some Further Examples 

A. Special Cases 

Although (for sake of simplicity) we did not discuss this 
case in the previous sections, other curious (but frequent) ex- 
amples are those when the number of discontinuities J BP of the 
BP EXIT curves is not equal to the number of discontinuities 
J MAP of the MAP EXIT curve. Examples [HI and [HI show two 
such cases. 
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Example 11 (J MAP < J BP ): Consider the dd pair (A, p) = 
( x \ x , 3x 1 2 } 7x ) and the corresponding LDPC ensemble 
with rate r = lfg| ps 0.5502. 




1 




(a) 



(b) 



Fig. 22. When the numbers of BP and MAP "jumps" (respectively, J BP 
and J MAP ) are different: (a) BP EXIT function with J BP = 2 (b) MAP 
EXIT function with J MAP = 1 and Maxwell constriction. 

The MAP EXIT curve has a single "jump" at e MAP ps 0.4493 
(x MAP ps 0.4425) whereas the BP EXIT curve has two such 



singularities at e BP ps 0.2941 (x B 



0.05738) and e 



(BP,2) 



0.3254 (x< BP ' 2 ) ps 0.2117) as shown in Fig. El As shown in 



Fig. 23. Function VEf=(u) for the dd pair formed by the residual ensemble 
at e MAP = 0.4493. 

Fig. [23] Theorem ^] applies at the MAP threshold and so 
the whole MAP EXIT curve is determined by the counting 
argument in this case. The Maxwell construction is therefore 
confirmed in this case. 

Example 12 (J BP < J MAf '): Consider the dd pair (A, p) = 



I Zx±Zx*±\Ax^_ 15 
I 20 ' 

with design rate r 



) and the corresponding LDPC ensemble 
|ii ps 0.5495. The BP EXIT curve has 

566 



Unfortunately, Theorem [TO] shows the tightness of the 
M construction only up to point A (at e ps 0.5063, see 
Fig. I24> . But it is quite natural to conjecture that the MAP 
EXIT curve has two singularities, namely at e MAP ps 0.3986 
(x MAP ps 0.0340) and at e( MAP > 2 > ps 0.4855 (3c< MAP < 2 ) ps 0.1096) as 
shown in Fig. [24] This is validated by the M decoder. Namely 
the M decoder gives a residual entropy (as a fraction of the 
blocklength) of ^ ps 0.0121 at e = 0.44. This value is exactly 
the value of the area (between e = and e = 0.44) under the 
conjectured MAP EXIT curve. This shows that, between the 
two conjectured MAP phase transitions, the M decoder follows 
the part of the EBP EXIT function which is "hidden" from the 
BP decoder. The Maxwell construction is conjectured to hold 
in this case. 



B. Difference Between MAP and BP Threshold 

Let r < 1 be the design rate. Consider a sequence of degree 
distribution pairs {(\(x),p(x)) — (a; 1-1 , x T ^ :_1 )}i>2 with 
fixed design rate r. Ensembles associated to this sequence 
are regular LDPC code ensembles. We have seen in Fact ^ 
that such ensembles have at most one jump and therefore 
we expect our bound on the MAP threshold to be tight. It 
was shown already in [38], that if 1 is increased then the 
weight distribution of such ensembles converges to the one of 
Shannon's random ensemble and, hence, the MAP threshold 
of such ensembles converges to the Shannon limit. Using the 
replica method, an explicit asymptotic expansion of the MAP 
threshold was given in [39]. 

Let us give here an alternative proof of this fact using 
our machinery. That the MAP threshold e MAP (l) converges 
to the Shannon threshold is shown in Fact [3] On the other 
hand, as stated in Fact [2] the BP threshold e MAP (l) goes to 
when 1 — > oo. This shows that the two thresholds can be 
arbitrarily far apart, and nevertheless the MAP EXIT curve can 
be constructed from the corresponding (E)BP EXIT curve! 

This is illustrated in Fig.l25land the proofs are given in the 
sequel. 






(a) 



(b) 



Fig. 24. When the numbers of BP and MAP "jumps" (respectively, J BP 
and J MAP ) are different: (a) BP EXIT function with J BP = 1 (b) MAP 
EXIT function with J MAP = 2 and Maxwell construction. 

a single "jump" at e BP ps 0.3531 (x BP ps 0.3008). 



Fig. 25. Regular BP EXIT entropy curves with design rate r = s. (a) 

Channel entropy function x >-> eW(x) (b) EXIT curve h^(e) < ► eW(h). 

The depicted ensembles are, in decreasing order, the (100, 200), the (35, 70), 
the (12, 24), the (6, 12), the (4, 8), the (3, 6) and the (2, 4) regular ensemble. 
While the BP threshold goes to 0, the bit MAP threshold goes to the Shannon 
limit 0.5. 
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Lemma 12: For a fixed non-negative x 6 (0, 1], denoting 

fW(x) = 



we get e^^(x) 



(l-(l-x)T^ 'J 1 - 1 ~ ' l^oo 

Proof: This limit is classically obtained with (l — 

l)log[l - (1 - x)^- 1 } (1 - 1)(1 - x)^- 1 which 



-lu-l 



r 



(l-x) 
gives (1 — (1 — x) 1 

J. *LXJ 

Fact 2: Consider the sequence (x 1 ~ 1 ,x r= 
fixed rate r < 1, then the BP threshold e BP (l) 



1>2 



with 



0. 



Proof: Consider first the BP threshold e BP (l) = 
min x {e (1) (x)}. Fix £ > (very small). Clearly < e BP (l) < 
eW(i), and, since e^(§) — ► I with Lemma [121 we can 
state 

ai eN, V1>1 e«(|) < | + |- 

This gives that, for all 1 > 1 , the statement < e BP (l) < £ 
holds. This is true for any fixed £ meaning e BP (l) — ► 0. ■ 

1 — »oo 

Instead of studying the parameterized EXIT quantity h(x) = 
(1 — (1 — x) 37-1 ) 1 , it is often more convenient to work directly 
with the inverse mapping h i— > x(h) = 1 — [1 — h^]~ such 



that we can eventually use e(h) 



1-ll-fcT" 



h~ 



for h G (0, 1]. 



Lemma 13: For a fixed h £ (0, 1), we have e(h) = 
i-(i-^)CT Q 

Proof: The second term of the numerator goes to 1 since, 

log(l-fct) = i^+log^-l) = i2^ + log(^ +0 (I)) 



such that 



r-l 



h~ 

log(- 



- log h 



0. The 



l-l+rl 1 1 1 

lemma follows from feV ^ ft, > 0. ■ 

facf 5: Consider again the sequence (x^~ ^)}i>2 
with fixed rate r < 1, then e MAP (l) — ► e sh = 1 - r > 0. 

Proof: First, the inequality < e 
the Area Theorem. 9 Second, 



e MAP (l) holds from 



e» _ e MAP (i) = (1 - r ) - e MAP (l) = „4 (1) < „4 (1) 

where, in short, represents the closed area between 

{e(h)} e BP <e<1 , the horizontal axis {e = e MAP } and the vertical 
axis {h = 1}. The area A^ is the surface of the unit 
square which lies under {e(/i)}o< e <i- Now, consider the 
function ^'(h) = min{e(ft.)W, 1} < 1, The Dominated 
Convergence Theorem 10 applied to the sequence gives that 
limi^oo A^ = 0, which concludes the proof. ■ 

C. Application to other Iterative Coding Schemes 

Although LDPC ensembles have been used to present 
the discussed concepts, the picture is not limited to such 
ensembles. Equivalent statements are expected to hold in large 
generality. 

To give just one example, consider generalized LDPC 
(GLDPC) ensembles: Part of our results can be directly 
applied like, e.g., Lemma [3] Consider a GLDPC ensemble: 
Equivalently to the dd pair (A, p), the pair (A(x),y(x)) = 
(A(x), 1 — p(l— x)) suffices to describe the BP decoding of the 

9 An alternative way is to show it via the Shannon Coding Theorem! 
'"Observe that t(h) does not uniformly converge to on (0, 1) since 
Jg 1 e(h)dh = 1 - r ± 0. 



ensemble in the asymptotic limit. The left (right) component of 
the pair (A(x), y(x)) gives the EXIT entropy outgoing from the 
left (right) nodes during the BP decoding. To be more precise, 
at a fixed channel parameter e, the function x(y) = eA(x) is 
the EXIT entropy outgoing from the left and y(x) = y(x, e) is 
the EXIT entropy outgoing from the right. 11 A few calculus 
or computations lead, in general, to an expression for the right 
component EXIT entropy (see, e.g, [40], [41]). 

Example 13 (GLDPC Codes): Generalized LDPC codes 
(see, e.g., [42]-[44]) are LDPC codes whose check nodes 
are replaced by some more complex linear constraints. Such 
constraints are viewed as component codes which typically 
have minimum distance d m [ n > 3: they are bit MAP decoded 
and the component EXIT entropy y(x) has smallest degree 
dmin — 1 (see, e.g., [41]). The EXIT entropy y(x) is the 
function y(x) = Yi( x )' where r is the length of 

a particular component code and where the expectation is 
taken with respect to all such component codes. The distri- 
bution A can be freely chosen but must satisfy the design 
rate constraint r = 1 — 1 T^ y where J y is the rate of 
the average component code (Area Theorem). For example, 
consider GLDPC ensembles using [2 P — 1 , 2 P — p— 1,3] binary 
Hamming codes as component codes. Then, when E<i m j n > 3, 
the BP EXIT entropy has at least one discontinuity at the BP 
threshold. It is given as, 



(e,h) = 



,A(y(x)) 



VA(y(x)r 

Theorem|3]shows that, in general, e BP ^ e MAP (The BP threshold 
being not given by the stability condition whenever the right 
component code has d m j n > 3). In the next table, the first 
example uses [7, 4, 3] Hamming codes such that its design 
rate is r = j with the pair (A, y) = (x, 3x 2 + 4x 3 — 15x 4 + 
12x 5 — 3x 6 ) whereas the second example uses the [15, 11, 3] 
Hamming code. It can be observed that this classical GLDPC 
have relatively bad BP threshold compared to its MAP upper- 
bound. In the third example, c Lin is no longer > 2 since 
we choose, in the node perspective, a mixture of 40 percent 
of [7, 6, 2] Single Parity-Check codes, 40 percent of [7, 4, 3] 
Hamming codes and 20 percent of [15, 11, 3] Hamming codes. 
The BP EXIT function has however still a discontinuity at the 
BP threshold. 



AM 



y(x) 



X 
X 

3x+7x 8 
10 



[7, 4, 3] 0.75645 0.85616 0.85714 
[15,11,3] 0.46785 0.52780 0.53333 
mixture 0.70483 0.71301 0.72801 



VIII. Conclusion 

We have shown that there is a close connection between 
the BP and the MAP decoder. While this connection is quite 
general, we focused in this paper on communication over the 
binary erasure channel. In this case, the relation is furnished 

"Contrary to the left nodes which stay simple repetition codes, the right 
nodes can be more complex linear codes. Therefore, y(x) often depends on 
the edge type. For GLDPC ensembles, we consider the average over all types 
of node. For Turbo codes, one usually distinguish between systematic versus 
parity bits. 
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by the so-called Maxwell decoder which gives an operational 
meaning to the various areas under the EBP EXIT curve as 
number of guesses and number of confirmations. Unfortu- 
nately, this paper falls slightly short on several accounts of 
proving this relationship in the most general case. Let us 
summarize what seem to be the most important issues that 
still need to be addressed. 

First, there is currently no direct proof which establishes 
the existence of the asymptotic MAP EXIT curve. Rather, 
the existence follows from the explicit characterization of this 
limit. This occurs via Theorem [H)] in all those cases where 
the conditions of the theorem are fulfilled. Although theses 
conditions apply to a large class of ensembles, it would be 
pleasing to show the existence of the limit in the general case. 

A further point that needs some clarification is the restriction 
we had to impose in the second proof of Theorem [8] Recall 
that the argument on the computation tree via the Area The- 
orem required that the underlying ensemble has a non-trivial 
stability condition, since otherwise part of the EBP EXIT curve 
lies "outside the unit box," i.e., part of the curve corresponds 
to "erasure probabilities above one." While an analytical prove 
of Theorem [8] is possible, it would be interesting (especially 
in view of generalizations) to have a conceptual proof valid 
for unconditionally stable ensembles. 

Without doubt the most important challenge is to assert 
the correctness of Conjecture ^ This would yield an easy 
and geometrically pleasing way of constructing the MAP 
EXIT curve from the EBP EXIT curve in the general case. 

Finally, an interesting research direction consists in the anal- 
ysis of more general combinatorial search problems through 
a suitable 'Maxwell construction'. An example (extremely 
close to the topic of this paper) consists in the problem of 
satisfiability of random sparse linear systems ('XORSAT') 
considered in [45], [46]. The counting argument presented 
in Section[V]is indeed closely related to the approach of these 
papers. The ideas presented here can probably be used to 
analyze the behavior of simple resolution algorithms for this 
problem (see [47] for a numerical exploration). 
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Appendix I 
Proofs for Concentration Theorems 

Throughout this section, we use the shorthand H n = 
H G (X\Y) to denote the conditional entropy under transmission 
over the BMS channel Py m | x [n] (• I •) using a code G chosen 
uniformly at random from LDPC(ra, A, p). 



A. Concentration of the Conditional Entropy 

Fix an arbitrary order for the m = (1 — r)n parity-check 
nodes, and let G f , t € [m], be a random variable describing the 
first t parity-check equations. Furthermore, let Go be a trivial 
(empty) random variable. Define the Doob martingale Z t = 
E[_ff„ | Gt]. The martingale property E[Z t+ i | Zq, . . . , Z t ] = 
Zt follows by construction. In order to stress that Z t is a 
(deterministic) function of the random variable G t , we will 
write Z t = Z(G t ). Obviously, Zq = E[H n ] is the expected 
conditional entropy over the code ensemble, and Z m = H n = 
Hq(X \Y) is the conditional entropy for a random code 
G. Theorem |4] follows therefore from the Hoeffding-Azuma 
inequality, once we bound the differences \Z t +\ — Z t \. This is 
our aim in the remaining of this subsection. 

Assume, for the sake of definiteness, that parity-checks have 
been ordered by increasing degree. The first mi of them have 
degree ri, the successive m-i have degree r2, and so on, with 
ri < T2 < .... The (t + l) th parity-check will therefore 
have a well defined degree, to be denoted by r. Consider two 
realizations G t+ i and G' t+1 of the first (t + 1) parity-checks 
which differ uniquely in the (t + l) th check. Let G be a code 
uniformly distributed over LDPC(A, p, n) whose restriction to 
the first (t+ 1) parity-checks coincides with G t+ i- Construct a 
new code G' whose restriction to the first (t + 1) parity-checks 
is G' t+1 , and which differs from G in at most (r + 1) parity- 
checks. This can be done by the 'switching' procedure of [17]. 
This switching procedure results in a "pairing up" of graphs. 
In order to obtain the desired result, it is now enough to show 
that \H G (X | Y) - H G ,{X \ Y)\ < a, for some n-independent 
constant a. 

Let us focus on the variation in conditional entropy under 
the addition of a single parity-check. Let G be a generic linear 
code and let G+ 1, be the same code with the added constraint 
that %il 63 ' ' ' 63 3?z r = 0. Define the corresponding parity bit 

x = Xi ± © ■ • • © Xi r , Then 

H G (X | Y) = H G (X \X,Y) + H G (X \ Y) - H G (X \ X, Y) 
= H G (X\X = 0,Y) + H G (X\Y) 
= H G+1 (X\Y) + H G {X\Y). 

The second equality follows since H G (X \ X, Y) = and by 
using the channel symmetry. The third step is a consequence 
of the definition of G + 1. Since J is a bit, its entropy is 
between and 1 and therefore 

\H G {X \Y) - H G+1 (X \Y)\ < 1. (23) 

Recall that G and G' differ in at most (r + 1) parity-checks, 
where r is upper bounded by r max , the maximal check-node 
degree. Equation {23) implies \H G {X\Y) - H G ,(X\Y)\ < 
(r + 1) and, therefore, Theorem 0] 

B. Concentration of the Derivative of the Conditional Entropy 

It is convenient to introduce the per-bit conditional en- 
tropy h n (e) = -^H G (X\Y) and its expected value h„(e) = 
^KH G (X\Y) when G is a random code drawn uniformly from 
the LDPC(A, p, n) ensemble. 
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Since the channel family {BMS(e)} ce / is smooth and 
ordered by physical degradation, h n (e) is differentiable convex 
function of e G /. Therefore 

i[An(e) - /i„(e - A)] < <(e) < i[M* + A) - h n (e)} , 

(24) 

for any A > such that [e — A, e + A] £ I. Because of 
Theorem @J we also have 

L{h n (e)-h n (e-A)-2Z}<h' n (e)< 

< -[h n (e + A)-h n {e)+2t;], 

with probability greater than 1 — Ae~ nB ^ (it follows from 
the proof in the previous subsection that A and B can be 
chosen uniformly in e). By averaging J24I over the code G, 
and subtracting it from the last equation, we get 

\h' n (e) - /4(e) | < ^Me+A) - 2h n (e) + h n (e-A) + 2£] , 

which, using the convexity of h n (e), and fixing A = t; 1 / 2 , 
implies 

K(e) - h' n (e)\ < Kie+i 1 / 2 ) h' n (e-?/ 2 )] + 2^ 2 . 

The functions h n are differentiable and convex and (by hy- 
pothesis) they converge to h(e) = ft, MAP (e) = lim rl ^oo —EH„ 
which is differentiable in J. It is a standard result in convex 
analysis (see [48]) that the derivatives h' n converge to hi 
uniformly in J. Therefore, there exists a sequence S n — > 0, 
such that 

\h' n (e) - h' n (e)\ < [h'{e+~e' 2 ) h'(e-?' 2 )} + S n + 2^ 2 . 

with probability greater than 1 — Aer nB ^~ . In order to com- 
plete the proof, it is sufficient to let (£) be the largest value 
of I, such that [^( e + fV2) _ /"'(e-^ 1 / 2 )] + 2^ 1/2 < f/2. 
Then the thesis holds with = BQ(£)/2. In particular, 
if /i(e) is twice differentiable with respect to e G J, then 
[/j'(e+| 1/2 ) - /i'(e-|V2)] < i|i/2 j and l^^j > ^2 

Appendix II 

Proofs of Lemmas in the Counting Argument 

A. Proof of Lemma® 

Let G(t) denote the residual graph after t iterations of the 
message passing decoder, and E G ( t ) = (A G ( t ),r G ( t )) be the 
corresponding degree distribution pair. Moreover, denote by 
S t = (At, Ft) the typical degree distribution pair of G(t). 
Explicitly 

A t (z) =A(zx t ), 

r t (z) 4 r(i - yt + zyt ) - r(i - yt ) - z yt r'(i - yt ) , 

where x t ,y t denote the typical fractions of erased messages 
after t iterations of the decoder. These are obtained by solving 
the density evolution equations x t+ i = eA(yt), y*+i = 1 — 
p(l — x t ) with initial condition xq = yo = 1. 
Notice that 

d(2 e ,S Ge ) < d(E e ,E t ) +d(3 t ,5 G (i)) + d(S G(t) , S G(e) ) . 



We claim that 



lim d(S G ( t ),H G ( e) ) = 0, 

lim E[d(S t ,S G(t) )] = 0, 

lim lim E[d(3 e ,3 t )] = 0. 

i — >oo n — >oo 



(25) 
(26) 
(27) 



Before proving those claims, let us show that they imply the 
thesis. It follows from the triangular inequality above that 



lim. 



lim,- 



,Ed(5 e ,3 G(e) ) = 0. But d(3 e ,3 G 



(0- 



does 



not depend upon t, therefore 

lim E[d(S e ,S G(e) )]=0. 

n — >oo 

This in turns imply the thesis via Markov inequality. 

We must now prove the inequalities d25i to i21\ . The first 
one is a trivial consequence of the convergence of DE to its 
fixed point: Hindoo x t = x, limt_ ! . 00 yt = y, together with 
the continuity of the expressions (|8}. (0 with x, y. Eq. (I26> 
follows from the general concentration analysis in [17]. 

In order to prove t27\ . consider a variable node i in the 
residual graph and imagine changing the received symbol at i, 
and update all the messages consequently. Consider the edges 
whose distance from i is larger than t, and denote by Wj the 
number of messages on such edges that change of value after 
the received symbol at i has been changed. It is clear that 

E[d(3 e ,3 t )] <E[WP], (28) 

The limit lim n ^ oc E[wP] can be computed through a branch- 
ing process analysis. The calculation is very similar to the one 
in [49] and we do not reproduce it here. The final result is 
that, as long as eA'(y)p'(l — x) < 1, there exist two positive 
constants A, b with b < 1 such that E^*-*] < A b*. The proof 
is finished by noticing that the condition e\'(y)p'(l — x) < 1 
is satisfied whenever e is a continuity point of x(e). 

B. Proof of Lemma |S] 

Notice that the function u 1— > v(u) defined in (II 31 enjoys 
the property v(\/u) = \/v(u) for any u > 0. Assume 
ab absurdum that does not achieves its maximum in 
the interval [0, 1]. Therefore, there exist u > 1 such that 
^s(u') < ^s(u) for any u' G [0, 1]. We will show that 
^s(l/u) > ^>~,(u) thus reaching a contradiction. In fact, some 
algebra shows that 



tts(l/u) = -A'(l)]og a 

; Ai iog 2 

V(l) 



(1 + uv) 



(1 



E- 

1 



u)(l 

J_ „,1 



F(l) 

The claim follows from < 



2(1- 







1 + 


\v + l) ] 







v-1 
v+1 



< 1 together with the 



monotonicity of the logarithm. 

In order to prove the second claim, i.e., the regularity of 
with respect to the dd pair write ^i 1 ' (u) + *i 2) (u) + (u) 
with tyi}' 2 ' 3 ^ the three summands in dl 2i . The estimate i\5l 
can be proved for each of the three terms separately. Here, 
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we limit ourselves to consider ^!~ \u), the derivation being 
nearly identical for the two other summands. Start by noticing 
that, for any u G [0,1] and any dd pair , we have 



1 x Ai 

^ 1 + u 1 ~ 



< 



^ 1 + u 1 - 



Now fix two dd pair 5 and 3. Let v(u) and v(u) the 
corresponding functions defined as in Jl 31 . Notice that 



?T^-E(irsr-5)<*-*> 

<i|!i (1 _„ ) ^|A 1 -A 1 | 



< 



1 



c (l-«)d(H,H) 



Using these inequalities, some calculus shows that 

1 > v(u), v(u) > 1 - 2 l max (l - u) , 
\v(u)-v(u)\<31 2 max (l-«)d(S,S). 



Next notice that, if we set f(u,v) = log 2 
for any u, v, v <E [0, 1], we have 



2(1- 



(l+u)(l+u) 



then, 



!/(«,«)! < 



log 2 



\f( u ,v)-f( u ,v)\<^-^\v-v\. 
Using these observations we obtain 

|*s(«) - *g(«)l < nua[/(u, «),/(«,«)] |A'(1) - A'(l)| 
+ max[A'(l),A'(l)]\f(u,v)-f(u,v)\ 

<^(1-«) 2 |A'(1)-A'(1)| 



log 2 
1 



(1 — u)\v — v\ 



log 2 

<Ai(l-u) 2 d(3,S), 
which confirms our thesis with constant A\ 



3lf nax )/ log 2. The variations of ^l 2 ' and ty^ 1 are bounded 
analogously. 

Appendix III 
Area and BP EXIT 

A. Two Useful Tricks 

We give here two lemmas which contain the two computa- 
tional tricks which are used all along this paper. Lemma 1741 
and Lemma [T31 will be again used in the next subsection of 
the appendix. Observe that the function x i— > h = A(y(x)) 
is composed by two functions y and A which are strictly 
increasing over [0,1]. Therefore, the inverse function x(h) 
exists and h i— ► x(h) = y _1 o A _1 (/i) is a continuous and 
strictly increasing bijection from [0,1] to [0,1]. The values 
e(x) = x(y(x)) can t ^ len ec l u i va l en tly be described by e(h) = 



,(3) 



(21? 



-1_A-1 



AoA 



Lemma 14: Given a dd pair (A, p) and any couple 
(x a ,Xb) <G [0, l] 2 . With the notations h a = h(x a ) = Aoy(x a ) 
and h b = h(x b ), we can then write 

/hb i / r*b 

e(h)dh = -jrj ^x 6 y(x b ) - x a y(x a ) - J y(x)dx 

Proof: This is a simple integration by parts once it has 
been observed e(x) ■ ^ = ■ (Aoy) <t y ' (x) = s£M ■ 

v ' dx Aoy(x) J A j X 

Lemma 15: Given a dd pair (A, p) and any interval 
(x Q x b ) C [0,1], x BP < x a over which e(y) = 
is increasing. Then, the function ft, BP (e) is continuous over 
(e a ,e b ), where e a = e(x a ) and e b = e(x b ), and 



/i BP (e)de 



/A 



y( x t) 

A(y)dy 

o 



A(y)dy 



- x b y(x b ) + x a y(x a ) + J y(x)dx). 



Proof: This is proved by, first, integrating by parts and, 
second, using Lemma H4l ■ 



B. Area under the BP EXIT Curve 

Theorem 11 (Area Theorem for BP Decoding): Given a 
dd pair (A, p) and the asymptotic BP EXIT entropy as defined 
in Corollary \l\ then 



1 



J i=l 



h B? (e)de, 



where A = A t - B { - d with A t = x l y(x l ) - x* _1 y(x*~ )> 
fli = e< J y J gl 1} A(y)dy, and Q = y(x)dx. 

Proof: Using Corollary [2 we can derive J29I as shown 
above where (a) comes from Lemma H31 and (b) uses the fact 
that e ? : = e(x i - 1 ) = e(x i ). ■ 

First, observe that Theorem quantifies the average sub- 
optimality of BP decoding compared to MAP decoding. The 
area under the BP EXIT curve is trivially larger or equal than 
the design rate since the IVs are non-negative. Moreover, it 
seems to indicate that there performance loss occurs at each 
phase transition. 

Second, Theorem^JJhas a pleasing geometric interpretation 
which goes back to the asymptotic analysis and which is 
explained in appendix IIV I 



Appendix IV 
Dynamic Interpretation of the Average Gap 
between map and bp decoding 

It is now well-known that the determination of capacity- 
achieving sequences on the erasure channel reduces to a curve- 
fitting problem, see, e.g., [50], [40]. This was the motivation 
for the Area Theorem and - so far - its unique application. Let 
us recall this view. For the purpose of illustration, and without 
essential loss of generality, we focus on the case of (G)LDPC 
ensembles. 
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h BP (e)de 



£ BP J i+1 

[ tf*(e)<k + W /T(e)de 

Jo i=1 A- 



J Z— 1 



e(x) / A(y)dy 



xy(x) 



y(x)dx 



Jo A(y)dy - Eti kx) J Hx) X(y)d 7 ? ) - ( 1 - £f =1 [*y( 



Jo 1 y(x)dx - E/ =1 y(x)dx 



^^l±Iz + ] Lg([ xy(x) 



" / A(y)dy 



y(x)dx 



(29) 
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Fig. 26. Iterative decoding trajectory for the ensemble LDPC(n, x 3 , x 4 ) (in 
the limit when n — > oo): increasing values of the channel parameter e. 

A. EXIT Chart 

Fig. 1261 summarizes the DE analysis of the BP decoding by 
showing the convergence of the recursive sequence formed 
the edge entropy {x t } f (i.e., the edge erasure probability). 
Such a representation (which emphasizes two component 
EXIT functions, one associated to the left nodes and one 
associated to the right nodes) is called EXIT chart in [11]. This 
representation is (asymptotically) exact for the binary erasure 
channel (since it is DE) whereas it is only approximate in the 
general case. 
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Fig. 27. Additive gap to capacity for the dd pair (x s , x 4 ). 

Fig. |^]represents the EXIT chart when transmission takes 
place at the BP threshold e = e BP . The EXIT functions are here 
the ones associated to the component of the LDPC ensemble. 
The function on the left is associated to repetition codes on 
the left while the one on the right is associated to parity-check 
codes. At channel parameter e = e BP , the two EXIT curves are 
tangent in (x BP , y BP ) and the EXIT chart offers also a graphical 
representation of the limiting gap to capacity of the LDPC 



ensemble. The additive gap C(e BP )— r to the Shannon threshold 
is indeed represented by the entire white area T> such that 

C(e BP )-r = e sh -e BP = -^-, 

where -J^ = A'(l) is the average left degree. In words, 
the area T> is the area between the left EXIT curve x 
A" 1 (x/e BP ) (at the BP threshold) and the right EXIT curve 
x i— ► 1 — p(l — x) which is bounded away by the unit 
square. This statement is presented, e.g., in [40]. We will now 
refine this statement by applying the Area Theorem to the 
EXIT curve of the LDPC ensemble previous statement (i.e., 
using the basic principle of our method). We will see that, in 
short, the area T> can be itself divided into two parts where the 
subarea below x BP represents the average gap between MAP 
and BP decoding. The determination of LDPC codes for which 
BP decoding is MAP reduces then again to a curve-fitting 
problem below x BP . 

B. Geometric Interpretation at the Component Level 

Fig. [28] shows a geometric representation of Theorem \^\ 
In (a) one see that the additive gap between BP threshold and 
Shannon threshold is represented by the total area between 
the component EXIT functions. Further, the part of this area 
which corresponds to the average gap between MAP and BP 
decoding is D\ as defined in Theorem II II 
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